Llama2,一款開(kāi)源大語(yǔ)言模型。Github倉(cāng)庫(kù)地址:
facebookresearch/llama: Inference code for LLaMA models (github.com)z???????zhttps://github.com/facebookresearch/llama
?中文地址:
GitHub - FlagAlpha/Llama2-Chinese: Llama中文社區(qū),最好的中文Llama大模型,完全開(kāi)源可商用Llama中文社區(qū),最好的中文Llama大模型,完全開(kāi)源可商用. Contribute to FlagAlpha/Llama2-Chinese development by creating an account on GitHub.https://github.com/FlagAlpha/Llama2-Chinese接下來(lái)將分享在Linux系統(tǒng)中部署這款模型的方法。一開(kāi)始嘗試了Windows,但Llama2在Windows系統(tǒng)中無(wú)法使用GPU運(yùn)行,如果想使用GPU,可以考慮另一款Llama2開(kāi)源模型LLC-LLM:
https://mlc.ai/mlc-llm/docs/get_started/try_out.htmlhttps://mlc.ai/mlc-llm/docs/get_started/try_out.html
?Linux系統(tǒng)中的部署方法
我實(shí)在autodl的算力平臺(tái)上部署的,在部署前查閱文檔,發(fā)現(xiàn)13B和70B對(duì)算力要求過(guò)高,于是選擇7B進(jìn)行嘗試。
?1. 克隆github倉(cāng)庫(kù)
git clone https://github.com/facebookresearch/llama.git
2. 進(jìn)入Llama文件夾
cd llama
3. 配置依賴(lài)
pip install -e .
4. demo代碼
但是我發(fā)現(xiàn)單單運(yùn)行官方給出的demo會(huì)遇到HTTPError,大意是說(shuō)你沒(méi)有限權(quán)訪(fǎng)問(wèn)meta-llama/Llama-2-7b-chat-hf,因?yàn)檫@個(gè)模型是現(xiàn)場(chǎng)從huggingface(一款開(kāi)源模型網(wǎng)站)上下下來(lái)的。所以,要對(duì)代碼進(jìn)行一些小小的修改。(以下為官方demo)https://huggingface.co/
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True)
model =model.eval()
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False)
tokenizer.pad_token = tokenizer.eos_token
input_ids = tokenizer(['<s>Human: 介紹一下中國(guó)\n</s><s>Assistant: '], return_tensors="pt",add_special_tokens=False).input_ids.to('cuda')
generate_input = {
"input_ids":input_ids,
"max_new_tokens":512,
"do_sample":True,
"top_k":50,
"top_p":0.95,
"temperature":0.3,
"repetition_penalty":1.3,
"eos_token_id":tokenizer.eos_token_id,
"bos_token_id":tokenizer.bos_token_id,
"pad_token_id":tokenizer.pad_token_id
}
generate_ids = model.generate(**generate_input)
text = tokenizer.decode(generate_ids[0])
print(text)
以下為修改(第3/6行)
model = AutoModelForCausalLM.from_pretrained('meta-llama/Llama-2-7b-chat-hf',device_map='auto',torch_dtype=torch.float16,load_in_8bit=True,use_auth_token="你的Token")
tokenizer = AutoTokenizer.from_pretrained('meta-llama/Llama-2-7b-chat-hf',use_fast=False,use_auth_token="你的Token")
我們需要添加一個(gè)Token,獲取的方式為:進(jìn)入以下網(wǎng)址,然后注冊(cè)登錄一系列操作以后生成Token,貌似只有write的token才能生效。
Hugging Face – The AI community builg the future.We’re on a journey to advance and democratize artificial intelligence through open source and open science.https://huggingface.co/settings/tokens
如果報(bào)錯(cuò)說(shuō)你沒(méi)有Transformer包,或者accelerate包,pip即可!
pip install transformers
pip install accelerate
模型下載完成?。ㄐ枰纫欢螘r(shí)間),PS:Llama-2-7b-chat-hf是主要用于聊天的,還有Llama-2-7b-hf,Llama-2-13b-chat-hf,Llama-2-13b-hf,等等版本,大家可以根據(jù)自己的需求下載。
???????文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-702035.html
?然后運(yùn)行就可以輸出結(jié)果了文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-702035.html
到了這里,關(guān)于部署Llama2的方法(Linux)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!