-
微調(diào)
大語(yǔ)言模型-ChatGLM-Tuning
大語(yǔ)言模型-微調(diào)chatglm6b
大語(yǔ)言模型-中文chatGLM-LLAMA微調(diào)
大語(yǔ)言模型-alpaca-lora -
本地知識(shí)庫(kù)
大語(yǔ)言模型2-document ai解讀
大語(yǔ)言模型-DocumentSearch解讀
大語(yǔ)言模型-中文Langchain
本文解讀代碼的地址:
https://github.com/27182812/ChatGLM-LLaMA-chinese-insturct
中文instruct在chatGLM, LLAMA上的表現(xiàn)
數(shù)據(jù)
json的預(yù)處理
- instruction
- tokenizer
相比大語(yǔ)言模型-ChatGLM-Tuning中,是兩個(gè)函數(shù)都放在了dataprocess的一個(gè)類中進(jìn)行,初步看起來(lái)需要改變的幾乎相同
微調(diào)
- 對(duì)chatGLM,finetune.sh
- 對(duì)LLAMA,test_llama1.py
對(duì)于chatGLM和之前文章幾乎相同,這里主要關(guān)注一下LLAMA
數(shù)據(jù)
def generate_prompt(data_point):
# sorry about the formatting disaster gotta move fast
if data_point["input"]:
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{data_point["instruction"]}
### Input:
{data_point["input"]}
### Response:
{data_point["output"]}"""
else:
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{data_point["instruction"]}
### Response:
{data_point["output"]}"""
def tokenize(prompt):
# there's probably a way to do this with the tokenizer settings
# but again, gotta move fast
result = tokenizer(
prompt,
truncation=True,
max_length=CUTOFF_LEN + 1,
padding="max_length",
)
return {
"input_ids": result["input_ids"][:-1],
"attention_mask": result["attention_mask"][:-1],
}
模型
model = LlamaForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf",
load_in_8bit=True,
device_map="auto",
)
tokenizer = LlamaTokenizer.from_pretrained(
"decapoda-research/llama-7b-hf", add_eos_token=True
)
model = prepare_model_for_int8_training(model)
config = LoraConfig(
r=LORA_R,
lora_alpha=LORA_ALPHA,
target_modules=["q_proj", "v_proj"],
lora_dropout=LORA_DROPOUT,
bias="none",
task_type="CAUSAL_LM",
)
model = get_peft_model(model, config)
tokenizer.pad_token_id = 0 # unk. we want this to be different from the eos token
微調(diào)文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-484350.html
data = data.shuffle().map(lambda x: tokenize(generate_prompt(x)))
trainer = transformers.Trainer(
model=model,
train_dataset=data["train"],
args=transformers.TrainingArguments(
per_device_train_batch_size=MICRO_BATCH_SIZE,
gradient_accumulation_steps=GRADIENT_ACCUMULATION_STEPS,
warmup_steps=100,
num_train_epochs=EPOCHS,
learning_rate=LEARNING_RATE,
fp16=True,
logging_steps=20,
output_dir="qys-alpaca-chinese",
save_total_limit=3,
),
data_collator=transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False),
)
model.config.use_cache = False
trainer.train(resume_from_checkpoint=False)
# trainer.train()
model.save_pretrained("qys-alpaca-chinese")
推理
- 對(duì)chatGLM,infer.py
- 對(duì)LLAMA,generate_llama1.py
推理代碼文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-484350.html
tokenizer = LlamaTokenizer.from_pretrained("decapoda-research/llama-7b-hf")
model = LlamaForCausalLM.from_pretrained(
"decapoda-research/llama-7b-hf",
load_in_8bit=True,
torch_dtype=torch.float16,
device_map="auto",
)
model = PeftModel.from_pretrained(
model, "./qys-alpaca-chinese", torch_dtype=torch.float16
)
def generate_prompt(instruction, input=None):
if input:
return f"""Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Input:
{input}
### Response:"""
else:
return f"""Below is an instruction that describes a task. Write a response that appropriately completes the request.
### Instruction:
{instruction}
### Response:"""
instructions = json.load(open("data/zh-data01.json"))
answers = []
with torch.no_grad():
for idx, item in enumerate(instructions[12:18]):
feature = format_example(item)
input_text = feature['context']
print(input_text)
inputs = tokenizer(input_text, return_tensors="pt")
input_ids = inputs["input_ids"].cuda()
generation_config = GenerationConfig(
temperature=0.1,
top_p=0.75,
top_k=40,
num_beams=4,
)
generation_output = model.generate(
input_ids=input_ids,
generation_config=generation_config,
return_dict_in_generate=True,
output_scores=True,
max_new_tokens=256,
)
s = generation_output.sequences[0]
output = tokenizer.decode(s)
print(output.strip())
print("--------------------------------------------")
到了這里,關(guān)于大語(yǔ)言模型-中文chatGLM-LLAMA微調(diào)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!