測試模型TinyLlama-1.1B-Chat-v1.0修改推理參數(shù),觀察參數(shù)變化與推理時間變化之間的關(guān)系。
本地環(huán)境:
處理器 Intel? Core? i5-8400 CPU @ 2.80GHz 2.80 GHz
機帶 RAM 16.0 GB (15.9 GB 可用)
集顯 Intel? UHD Graphics 630
獨顯 NVIDIA GeForce GTX 1050
主要測試修改:
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)文章來源:http://www.zghlxwxcb.cn/news/detail-861458.html
源代碼來源(鏡像):https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0文章來源地址http://www.zghlxwxcb.cn/news/detail-861458.html
'''
https://hf-mirror.com/TinyLlama/TinyLlama-1.1B-Chat-v1.0
測試tinyLlama 1.1B效果不錯,比Qwen1.8B經(jīng)過量化的都好很多
'''
# Install transformers from source - only needed for versions <= v4.34
# pip install git+https://github.com/huggingface/transformers.git
# pip install accelerate
import os
from datetime import datetime
import torch
os.environ['TF_ENABLE_ONEDNN_OPTS'] = '0'
from transformers import pipeline
'''
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16, device_map="auto")
# We use the tokenizer's chat template to format each message - see https://hf-mirror.com/docs/transformers/main/en/chat_templating
messages = [
{
"role": "system",
"content": "You are a friendly chatbot who always responds in the style of a pirate",
},
# {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
{"role": "user", "content": "你叫什么名字?"},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
'''
# <|system|>
# You are a friendly chatbot who always responds in the style of a pirate.</s>
# <|user|>
# How many helicopters can a human eat in one sitting?</s>
# <|assistant|>
# ...
def load_pipeline():
pipe = pipeline("text-generation", model="TinyLlama/TinyLlama-1.1B-Chat-v1.0", torch_dtype=torch.bfloat16,
device_map="auto")
return pipe
def generate_text(content, length=20):
"""
根據(jù)給定的prompt生成文本
"""
messages = [
{
"role": "提示",
"content": "這是個友好的聊天機器人...",
},
# {"role": "user", "content": "How many helicopters can a human eat in one sitting?"},
{"role": "user", "content": content},
]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
datetime1 = datetime.now()
outputs = pipe(prompt, max_new_tokens=256, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
datetime2 = datetime.now()
time12_interval = datetime2 - datetime1
print("時間間隔", time12_interval)
if False:
outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
datetime3 = datetime.now()
time23_interval = datetime3 - datetime2
print("時間間隔2", time23_interval)
outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=50)
print(outputs[0]["generated_text"])
datetime4 = datetime.now()
time34_interval = datetime4 - datetime3
print("時間間隔3", time34_interval)
outputs = pipe(prompt, max_new_tokens=32, do_sample=True, temperature=0.7, top_k=30, top_p=0.95)
print(outputs[0]["generated_text"])
datetime5 = datetime.now()
time45_interval = datetime5 - datetime4
print("時間間隔4", time45_interval)
outputs = pipe(prompt, max_new_tokens=32, do_sample=False, top_k=30)
print(outputs[0]["generated_text"])
datetime6 = datetime.now()
time56_interval = datetime6 - datetime5
print("時間間隔5", time56_interval)
outputs = pipe(prompt, max_new_tokens=12, do_sample=True, temperature=0.7, top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])
datetime7 = datetime.now()
time67_interval = datetime7 - datetime6
print("時間間隔6", time67_interval)
'''
結(jié)論:修改top_p不會顯著降低推理時間,并且中英文相同的問題,中文問題推理時間是英文的兩倍
do_sample修改成False基本不會降低推理時間
只有max_new_tokens才能顯著降低推理時間,但是max_new_tokens與推理時間不是呈線性關(guān)系
比如max_new_tokens=256,推理時間2分鐘
當(dāng)max_new_tokens=32的時候,推理時間才會變成約1分鐘
因此,不如將max_new_tokens設(shè)置大些用于獲取比較完整的答案
'''
return outputs
if __name__ == "__main__":
'''
main function
'''
global pipe
pipe = load_pipeline()
# print('load pipe ok')
while True:
prompt = input("請輸入一個提示(或輸入'exit'退出):")
if prompt.lower() == 'exit':
break
try:
generated_text = generate_text(prompt)
print("生成的文本:")
print(generated_text[0]["generated_text"])
except Exception as e:
print("發(fā)生錯誤:", e)
請輸入一個提示(或輸入'exit'退出):如何開門?
<|user|>
如何開門?</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:
1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.
2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.
3. Release the latch: Once the door is open, release the latch by pulling it backward.
4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.
5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.
6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.
Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
時間間隔 0:04:23.561065
生成的文本:
<|user|>
如何開門?</s>
<|assistant|>
Certainly! Opening a door is a simple process that involves several steps. Here are the general steps to follow to open a door:
1. Turn off the lock: Turn off the lock with the key by pressing the "lock" button.
2. Press the handle: Use the handle to push the door open. If the door is mechanical, you may need to turn a knob or pull the door handle to activate the door.
3. Release the latch: Once the door is open, release the latch by pulling it backward.
4. Slide the door: Slide the door forward by pushing it against the wall with your feet or using a push bar.
5. Close the door: Once the door is open, close it by pressing the lock button or pulling the handle backward.
6. Use a second key: If the lock has a second key, make sure it is properly inserted and then turn it to the correct position to unlock the door.
Remember to always double-check the locks before opening a door, as some locks can be tricky to open. If you're unsure about the correct procedure for opening a door,
請輸入一個提示(或輸入'exit'退出):
到了這里,關(guān)于測試大語言模型在嵌入式設(shè)備部署的可能性——模型TinyLlama-1.1B-Chat-v1.0的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!