LLM 테스트

1. 허깅 페이스에서 llama-3 중 유명한 모델을 찾아 그중 한국어 파인 튜닝된 모델을 찾았다

> MLP-KTLim/llama-3-Korean-Bllossom-8B

> MLP-KTLim/llama-3-Korean-Bllossom-8B-gguf-Q4_K_M

2. 처음이라 멀 사용해야 할지 몰라 일단 두개 다 다운받았으나 ollama 를 사용할 예정이라 이리 저리 찾아보던중 gguf 를 사용해야한다 해서 아래 모델로 사용

> MLP-KTLim/llama-3-Korean-Bllossom-8B
> MLP-KTLim/llama-3-Korean-Bllossom-8B-gguf-Q4_K_M
두개 차이 알려줘

> Ollama는 llama.cpp 기반으로 동작하며, llama.cpp는 GGUF 포맷을 기본 지원합니다.
> "gguf-Q4_K_M" 모델은 파일 크기가 작고 가벼워서 저사양 GPU나 CPU 환경에서도 인퍼런스가 가능하도록 설계되었습니다.

직접 말아볼라했는데... 굳이 시간 버리는거같아서 시도하다 말았다.ㅋ

3. 리눅스쉘

> pip install -U "huggingface_hub[cli]"

> huggingface-cli download MLP-KTLim/llama-3-Korean-Bllossom-8B-gguf-Q4_K_M --local-dir ./bllossom-model
> curl -fsSL https://ollama.com/install.sh | sh

> mkdir -p ~/.ollama/models

> cp -p ./bllossom-model/llama-3-Korean-Bllossom-8B-Q4_K_M.gguf ~/.ollama/models/

> cd ~/.ollama/models/

> vi Modelfile << 요놈이 중요하다

FROM llama-3-Korean-Bllossom-8B-Q4_K_M.gguf

SYSTEM """당신은 유용한 AI 어시스턴트입니다. 사용자의 질의에 대해 친절하고 정확하게 답변해야 합니다. You are a helpful AI assistant, you'll need to answer users' queries in a friendly and accurate manner. 모든 대답은 한국어(Korean)으로 대답해주세요."""

TEMPLATE """{{- if .System }}
<s>{{ .System }}</s>
{{- end }}
<s>Human: {{ .Prompt }}</s>
<s>Assistant: """

PARAMETER temperature 0.6 # 창의성 정도, 낮을수록 일관된 응답
PARAMETER num_predict 3000 # 한번 생성 가능한 최대 토큰수
PARAMETER num_ctx 4096 # 한번에 처리 가능한 최대 컨텍스트 길이 토큰
PARAMETER stop <s> # 생성중 <s> 만나면 출력 중단
PARAMETER stop </s> # 생성중 </s> 만나면 출력 중단
PARAMETER stop <|eot_id|> # 생성중<|eot_id|> 만나면 출력 중단

> 파일 생성 후

> ollama create ollama-ko-0710 -f .\Modelfile