国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

使用 LoRA 技術(shù)對 LLaMA 65B 大模型進行微調(diào)及推理

這篇具有很好參考價值的文章主要介紹了使用 LoRA 技術(shù)對 LLaMA 65B 大模型進行微調(diào)及推理。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方,請大家不吝賜教,您也可以點擊"舉報違法"按鈕提交疑問。

前幾天,Meta 發(fā)布了 LIMA 大模型,在LLaMA-65B的基礎(chǔ)上,無需使用 RLHF,只用了 1000 個精心準備的樣本數(shù)據(jù)進行微調(diào),就達到了和 GPT-4 相媲美的程度。這激發(fā)了我探索 LLaMA 65B 大模型的興趣。

之前的一系列大模型相關(guān)文章都是在LLaMA 7B/13B模型參數(shù)上面進行微調(diào),文本使用 LoRA 技術(shù)對 LLaMA 30B/65B 大模型進行微調(diào)。相關(guān)代碼放置在GitHub上面:llm-action。

環(huán)境準備

基礎(chǔ)環(huán)境配置如下:

  • 操作系統(tǒng): CentOS 7
  • CPUs: 單個節(jié)點具有 1TB 內(nèi)存的 Intel CPU,物理CPU個數(shù)為64,每顆CPU核數(shù)為16
  • GPUs: 8 卡 A800 80GB GPUs
  • Python: 3.10 (需要先升級OpenSSL到1.1.1t版本( 點擊下載OpenSSL),然后再編譯安裝Python), 點擊下載Python
  • NVIDIA驅(qū)動程序版本: 515.65.01,根據(jù)不同型號選擇不同的驅(qū)動程序, 點擊下載。
  • CUDA工具包: 11.7, 點擊下載
  • NCCL: nccl_2.14.3-1+cuda11.7, 點擊下載
  • cuDNN: 8.8.1.3_cuda11, 點擊下載

本文的實驗環(huán)境與足夠驚艷,使用Alpaca-Lora基于LLaMA(7B)二十分鐘完成微調(diào),效果比肩斯坦福羊駝一文中的實驗環(huán)境一致,因此不再贅述。

直接激活虛擬環(huán)境。

source?/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/bin/activate

數(shù)據(jù)集準備

數(shù)據(jù)集直接使用alpaca-lora項目提供的alpaca_data.json、alpaca_data_cleaned_archive.jsonalpaca_data_gpt4.json即可。除此之外,可參考GPT-4-LLM項目,該項目還提供了使用Alpaca的Prompt翻譯成中文使用 GPT4 生成了 5.2 萬條指令跟隨數(shù)據(jù)。

模型格式轉(zhuǎn)換

首先,對原始的 LLaMA 30B/65B 大模型進行模型格式轉(zhuǎn)換。模型轉(zhuǎn)換的具體步驟請參考之前的文章:從0到1復(fù)現(xiàn)斯坦福羊駝(Stanford Alpaca 7B)

原始 LLaMA 65B模型權(quán)重:

>?tree?llama-model/65B/
llama-model/65B/
├──?checklist.chk
├──?consolidated.00.pth
...
├──?consolidated.07.pth
└──?params.json

0?directories,?10?files

轉(zhuǎn)換HF格式后的 LLaMA 65B 模型權(quán)重:

ls?-al?hf-llama-model/llama-65b/?hf-llama-model/tokenizer/
hf-llama-model/llama-65b/:
total?127511452
drwxrwxr-x?1?nobody?nobody??????????0?Mar?27?20:44?.
drwxrwxr-x?1?nobody?nobody??????????0?Mar?27?20:35?..
-rw-rw-r--?1?nobody?nobody????????426?Mar?27?20:44?config.json
-rw-rw-r--?1?nobody?nobody????????124?Mar?27?20:44?generation_config.json
-rw-rw-r--?1?nobody?nobody?1619037191?Mar?27?20:38?pytorch_model-00001-of-00081.bin
...
-rw-rw-r--?1?nobody?nobody?1048593571?Mar?27?20:44?pytorch_model-00081-of-00081.bin
-rw-rw-r--?1?nobody?nobody??????63494?Mar?27?20:44?pytorch_model.bin.index.json

hf-llama-model/tokenizer/:
total?500
drwxrwxr-x?1?nobody?nobody??????0?Mar?30?10:53?.
drwxrwxr-x?1?nobody?nobody??????0?Mar?27?20:35?..
-rw-rw-r--?1?nobody?nobody??????2?Mar?30?10:53?special_tokens_map.json
-rw-rw-r--?1?nobody?nobody????141?Mar?30?10:53?tokenizer_config.json
-rw-rw-r--?1?nobody?nobody?499723?Mar?30?10:53?tokenizer.model

然后,將tokenizer目錄的文件拷貝到llama-65B目錄下。

cp?hf-llama-model/tokenizer/*?hf-llama-model/llama-65b/

LLaMA 30B 的轉(zhuǎn)換工作與之類似,不再贅述。

模型微調(diào)

LLaMA-30B

首先,對 LLaMA 30B 進行微調(diào),30B 參數(shù)的模型大約60G左右。在A800上面 micro_batch_size 為 6 能夠充分利用顯存資源。

模型訓(xùn)練過程:

torchrun?--nproc_per_node=8?--master_port=29005?finetune.py?\
>?--base_model?'/data/nfs/guodong.li/pretrain/hf-llama-model/llama-30b'?\
>?--data_path?'/data/nfs/guodong.li/data/alpaca_data_cleaned.json'?\
>?--output_dir?'/home/guodong.li/output/alpaca-lora-30b-dp'?\
>?--batch_size?96?\
>?--micro_batch_size?6?\
>?--num_epochs?2

CUDA?SETUP:?CUDA?runtime?path?found:?/usr/local/cuda-11.7/lib64/libcudart.so
CUDA?SETUP:?Highest?compute?capability?among?GPUs?detected:?8.0
CUDA?SETUP:?Detected?CUDA?version?117
CUDA?SETUP:?Loading?binary?/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda117.so...
Training?Alpaca-LoRA?model?with?params:
base_model:?/data/nfs/guodong.li/pretrain/hf-llama-model/llama-30b
data_path:?/data/nfs/guodong.li/data/alpaca_data_cleaned.json
output_dir:?/home/guodong.li/output/alpaca-lora-30b-dp
batch_size:?96
micro_batch_size:?6
num_epochs:?2
learning_rate:?0.0003
cutoff_len:?256
val_set_size:?2000
lora_r:?8
lora_alpha:?16
lora_dropout:?0.05
lora_target_modules:?['q_proj',?'v_proj']
train_on_inputs:?True
group_by_length:?False
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint:?False
prompt?template:?alpaca

...

Loading?checkpoint?shards:?100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?61/61?[02:11<00:00,??2.16s/it]
Loading?checkpoint?shards:?100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?61/61?[02:12<00:00,??2.17s/it]
Found?cached?dataset?json?(/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1/1?[00:00<00:00,?187.05it/s]
trainable?params:?12779520?||?all?params:?32541723136?||?trainable%:?0.03927118409369777

...

Loading?cached?split?indices?for?dataset?at?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-d8c5d7ac95d53860.arrow?and?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-4a34b0c9feb19e72.arrow
Map:???4%|█████▍????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????|?1904/49942?[00:01<00:38,?1244.61?examples/s]Found?cached?dataset?json?(/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1/1?[00:00<00:00,?193.31it/s]
trainable?params:?12779520?||?all?params:?32541723136?||?trainable%:?0.03927118409369777
Map:???9%|████████████▊?????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????|?4513/49942?[00:03<00:32,?1402.69?examples/s]Loading?cached?split?indices?for?dataset?at?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-d8c5d7ac95d53860.arrow?and?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-4a34b0c9feb19e72.arrow
Map:??66%|█████████████████████████████████████████████████████████████████████████████████████████████▌???????????????????????????????????????????????|?33152/49942?[00:24<00:12,?1340.03?examples/s]Found?cached?dataset?json?(/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1/1?[00:00<00:00,?561.56it/s]
trainable?params:?12779520?||?all?params:?32541723136?||?trainable%:?0.03927118409369777
Loading?cached?split?indices?for?dataset?at?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-d8c5d7ac95d53860.arrow?and?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-4a34b0c9feb19e72.arrow
Map:??67%|██████████████████████████████████████████████████████████████████████████████████████████████▍??????????????????????????????????????????????|?33433/49942?[00:24<00:12,?1371.96?examples/s]Found?cached?dataset?json?(/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1/1?[00:00<00:00,?627.33it/s]
Map:??40%|█████████████████████████████████████████████████████████????????????????????????????????????????????????????????????????????????????????????|?20222/49942?[00:16<00:26,?1104.62?examples/s]trainable?params:?12779520?||?all?params:?32541723136?||?trainable%:?0.03927118409369777
Loading?cached?split?indices?for?dataset?at?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-d8c5d7ac95d53860.arrow?and?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-4a34b0c9feb19e72.arrow
{'loss':?2.0954,?'learning_rate':?2.9999999999999997e-05,?'epoch':?0.02}
{'loss':?1.984,?'learning_rate':?5.6999999999999996e-05,?'epoch':?0.04}
{'loss':?1.7062,?'learning_rate':?8.4e-05,?'epoch':?0.06}
{'loss':?1.3441,?'learning_rate':?0.00011399999999999999,?'epoch':?0.08}
{'loss':?1.1435,?'learning_rate':?0.00014099999999999998,?'epoch':?0.1}
{'loss':?0.9968,?'learning_rate':?0.00017099999999999998,?'epoch':?0.12}
{'loss':?0.9275,?'learning_rate':?0.000201,?'epoch':?0.13}
...
{'loss':?0.812,?'learning_rate':?0.00026904255319148935,?'epoch':?0.38}
{'eval_loss':?0.8141900897026062,?'eval_runtime':?28.5046,?'eval_samples_per_second':?70.164,?'eval_steps_per_second':?1.123,?'epoch':?0.38}
{'loss':?0.8016,?'learning_rate':?0.0002658510638297872,?'epoch':?0.4}
{'loss':?0.8024,?'learning_rate':?0.0002626595744680851,?'epoch':?0.42}
{'loss':?0.7938,?'learning_rate':?0.000259468085106383,?'epoch':?0.44}
...
{'loss':?0.793,?'learning_rate':?0.00021478723404255316,?'epoch':?0.71}
{'loss':?0.7884,?'learning_rate':?0.00021159574468085105,?'epoch':?0.73}
{'loss':?0.7748,?'learning_rate':?0.00020840425531914894,?'epoch':?0.75}
{'loss':?0.7869,?'learning_rate':?0.00020521276595744677,?'epoch':?0.77}
{'eval_loss':?0.8041278719902039,?'eval_runtime':?28.2371,?'eval_samples_per_second':?70.829,?'eval_steps_per_second':?1.133,?'epoch':?0.77}
{'loss':?0.7846,?'learning_rate':?0.00020202127659574466,?'epoch':?0.79}
{'loss':?0.791,?'learning_rate':?0.00019882978723404255,?'epoch':?0.81}
{'loss':?0.7923,?'learning_rate':?0.00019563829787234039,?'epoch':?0.83}
...
{'loss':?0.7775,?'learning_rate':?0.0001573404255319149,?'epoch':?1.06}
{'loss':?0.7883,?'learning_rate':?0.00015414893617021278,?'epoch':?1.08}
{'loss':?0.7805,?'learning_rate':?0.0001509574468085106,?'epoch':?1.1}
{'loss':?0.7955,?'learning_rate':?0.0001477659574468085,?'epoch':?1.11}
{'loss':?0.7801,?'learning_rate':?0.00014457446808510636,?'epoch':?1.13}
{'loss':?0.7933,?'learning_rate':?0.00014138297872340425,?'epoch':?1.15}
{'eval_loss':?0.8008487820625305,?'eval_runtime':?28.9576,?'eval_samples_per_second':?69.066,?'eval_steps_per_second':?1.105,?'epoch':?1.15}
{'loss':?0.785,?'learning_rate':?0.0001381914893617021,?'epoch':?1.17}
{'loss':?0.7686,?'learning_rate':?0.000135,?'epoch':?1.19}
{'loss':?0.7717,?'learning_rate':?0.00013180851063829786,?'epoch':?1.21}
...
{'loss':?0.7688,?'learning_rate':?8.393617021276595e-05,?'epoch':?1.5}
{'loss':?0.7785,?'learning_rate':?8.074468085106383e-05,?'epoch':?1.52}
{'loss':?0.7767,?'learning_rate':?7.75531914893617e-05,?'epoch':?1.54}
{'eval_loss':?0.7986326813697815,?'eval_runtime':?28.3196,?'eval_samples_per_second':?70.622,?'eval_steps_per_second':?1.13,?'epoch':?1.54}
{'loss':?0.7907,?'learning_rate':?7.436170212765956e-05,?'epoch':?1.56}
{'loss':?0.7691,?'learning_rate':?7.117021276595744e-05,?'epoch':?1.58}
...
{'loss':?0.7649,?'learning_rate':?1.6914893617021273e-05,?'epoch':?1.9}
{'loss':?0.7624,?'learning_rate':?1.3723404255319146e-05,?'epoch':?1.92}
{'eval_loss':?0.7973329424858093,?'eval_runtime':?29.2014,?'eval_samples_per_second':?68.49,?'eval_steps_per_second':?1.096,?'epoch':?1.92}
{'loss':?0.7824,?'learning_rate':?1.0531914893617022e-05,?'epoch':?1.94}
{'loss':?0.7772,?'learning_rate':?7.3404255319148934e-06,?'epoch':?1.96}
{'loss':?0.7762,?'learning_rate':?4.148936170212765e-06,?'epoch':?1.98}
{'loss':?0.7572,?'learning_rate':?9.574468085106382e-07,?'epoch':?2.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1040/1040?[1:18:34<00:00,??4.44s/it]
{'train_runtime':?4716.2302,?'train_samples_per_second':?21.179,?'train_steps_per_second':?0.221,?'train_loss':?0.8336130522764646,?'epoch':?2.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1040/1040?[1:18:34<00:00,??4.53s/it]

模型權(quán)重文件:

>?tree?-h??/home/guodong.li/output/alpaca-lora-30b-dp
/home/guodong.li/output/alpaca-lora-30b-dp
├──?[?424]??adapter_config.json
├──?[?49M]??adapter_model.bin
└──?[4.0K]??checkpoint-1000
????├──?[?98M]??optimizer.pt
????├──?[?49M]??pytorch_model.bin
????├──?[?14K]??rng_state_0.pth
????├──?[?14K]??rng_state_1.pth
????├──?[?14K]??rng_state_2.pth
????├──?[?14K]??rng_state_3.pth
????├──?[?14K]??rng_state_4.pth
????├──?[?14K]??rng_state_5.pth
????├──?[?14K]??rng_state_6.pth
????├──?[?14K]??rng_state_7.pth
????├──?[?557]??scaler.pt
????├──?[?627]??scheduler.pt
????├──?[?13K]??trainer_state.json
????└──?[3.5K]??training_args.bin

1?directory,?16?files

可以看到在A800上面,數(shù)據(jù)并行為8,5萬條數(shù)據(jù),單次epoch大約需要40分鐘左右。

LLaMA-65B

首先,對 LLaMA 65B 進行微調(diào),65B 參數(shù)的模型大約120G左右。為了讓單卡A800能夠跑65B的大模型,這里將micro_batch_size設(shè)置為1。

模型訓(xùn)練過程:

torchrun?--nproc_per_node=8?--master_port=29005?finetune.py?\
>?--base_model?'/data/nfs/guodong.li/pretrain/hf-llama-model/llama-65b'?\
>?--data_path?'/data/nfs/guodong.li/data/alpaca_data_cleaned.json'?\
>?--output_dir?'/home/guodong.li/output/alpaca-lora-65b-dp'?\
>?--batch_size?8?\
>?--micro_batch_size?1?\
>?--num_epochs?1
...
CUDA?SETUP:?CUDA?runtime?path?found:?/usr/local/cuda-11.7/lib64/libcudart.so
CUDA?SETUP:?Highest?compute?capability?among?GPUs?detected:?8.0
CUDA?SETUP:?Detected?CUDA?version?117
CUDA?SETUP:?Loading?binary?/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda117.so...
Training?Alpaca-LoRA?model?with?params:
base_model:?/data/nfs/guodong.li/pretrain/hf-llama-model/llama-65b
data_path:?/data/nfs/guodong.li/data/alpaca_data_cleaned.json
output_dir:?/home/guodong.li/output/alpaca-lora-65b-dp
batch_size:?8
micro_batch_size:?1
num_epochs:?1
learning_rate:?0.0003
cutoff_len:?256
val_set_size:?2000
lora_r:?8
lora_alpha:?16
lora_dropout:?0.05
lora_target_modules:?['q_proj',?'v_proj']
train_on_inputs:?True
group_by_length:?False
wandb_project:
wandb_run_name:
wandb_watch:
wandb_log_model:
resume_from_checkpoint:?False
prompt?template:?alpaca

Loading?checkpoint?shards:?100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?81/81?[02:06<00:00,??1.56s/it]
Loading?checkpoint?shards:?100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?81/81?[02:20<00:00,??1.74s/it]
...
Map:??13%|█████████████████▉????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????????|?6312/49942?[00:04<00:30,?1410.98?examples/s]Found?cached?dataset?json?(/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e)
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?1/1?[00:00<00:00,?196.47it/s]
trainable?params:?20971520?||?all?params:?65306632192?||?trainable%:?0.03211238934867168
Loading?cached?split?indices?for?dataset?at?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-d8c5d7ac95d53860.arrow?and?/home/guodong.li/.cache/huggingface/datasets/json/default-2dab63d15cf49261/0.0.0/fe5dd6ea2639a6df622901539cb550cf8797e5a6b2dd7af1cf934bed8e233e6e/cache-4a34b0c9feb19e72.arrow
{'loss':?2.1086,?'learning_rate':?2.3999999999999997e-05,?'epoch':?0.0}
{'loss':?2.0261,?'learning_rate':?5.399999999999999e-05,?'epoch':?0.0}
{'loss':?1.7054,?'learning_rate':?8.4e-05,?'epoch':?0.0}
{'loss':?1.2423,?'learning_rate':?0.00011099999999999999,?'epoch':?0.01}
{'loss':?0.9976,?'learning_rate':?0.00013199999999999998,?'epoch':?0.01}
{'loss':?0.801,?'learning_rate':?0.000162,?'epoch':?0.01}
{'loss':?0.839,?'learning_rate':?0.00019199999999999998,?'epoch':?0.01}
{'loss':?0.8134,?'learning_rate':?0.00022199999999999998,?'epoch':?0.01}
{'loss':?0.7575,?'learning_rate':?0.00025199999999999995,?'epoch':?0.01}
...
{'loss':?0.769,?'learning_rate':?0.0001992023441315318,?'epoch':?0.35}
{'loss':?0.7393,?'learning_rate':?0.00019871398339573498,?'epoch':?0.35}
{'loss':?0.7269,?'learning_rate':?0.0001982256226599381,?'epoch':?0.35}
{'loss':?0.6783,?'learning_rate':?0.00019773726192414128,?'epoch':?0.35}
{'eval_loss':?0.7974867820739746,?'eval_runtime':?48.5181,?'eval_samples_per_second':?41.222,?'eval_steps_per_second':?0.66,?'epoch':?0.35}
{'loss':?0.6891,?'learning_rate':?0.00019724890118834445,?'epoch':?0.35}
{'loss':?0.7216,?'learning_rate':?0.0001967605404525476,?'epoch':?0.36}
{'loss':?0.7114,?'learning_rate':?0.00019627217971675075,?'epoch':?0.36}
{'loss':?0.7089,?'learning_rate':?0.0001957838189809539,?'epoch':?0.36}
...
{'loss':?0.6985,?'learning_rate':?5.323132020185577e-06,?'epoch':?0.98}
{'loss':?0.7167,?'learning_rate':?4.834771284388734e-06,?'epoch':?0.99}
{'loss':?0.7433,?'learning_rate':?4.346410548591893e-06,?'epoch':?0.99}
{'loss':?0.6875,?'learning_rate':?3.8580498127950505e-06,?'epoch':?0.99}
{'loss':?0.7104,?'learning_rate':?3.369689076998209e-06,?'epoch':?0.99}
{'loss':?0.7346,?'learning_rate':?2.881328341201367e-06,?'epoch':?0.99}
{'loss':?0.7062,?'learning_rate':?2.3929676054045255e-06,?'epoch':?0.99}
{'eval_loss':?0.787121593952179,?'eval_runtime':?48.4232,?'eval_samples_per_second':?41.303,?'eval_steps_per_second':?0.661,?'epoch':?0.99}
{'loss':?0.701,?'learning_rate':?1.9046068696076832e-06,?'epoch':?0.99}
{'loss':?0.7169,?'learning_rate':?1.4162461338108414e-06,?'epoch':?1.0}
{'loss':?0.763,?'learning_rate':?9.278853980139996e-07,?'epoch':?1.0}
{'loss':?0.6903,?'learning_rate':?4.3952466221715773e-07,?'epoch':?1.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?6243/6243?[4:36:50<00:00,??2.42s/it]
{'train_runtime':?16612.2434,?'train_samples_per_second':?3.006,?'train_steps_per_second':?0.376,?'train_loss':?0.7368283385404043,?'epoch':?1.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?6243/6243?[4:36:50<00:00,??2.66s/it]

顯存占用:

Tue?May?23?17:05:37?2023
+-----------------------------------------------------------------------------+
|?NVIDIA-SMI?515.105.01???Driver?Version:?515.105.01???CUDA?Version:?11.7?????|
|-------------------------------+----------------------+----------------------+
|?GPU??Name????????Persistence-M|?Bus-Id????????Disp.A?|?Volatile?Uncorr.?ECC?|
|?Fan??Temp??Perf??Pwr:Usage/Cap|?????????Memory-Usage?|?GPU-Util??Compute?M.?|
|???????????????????????????????|??????????????????????|???????????????MIG?M.?|
|===============================+======================+======================|
|???0??NVIDIA?A800?80G...??Off??|?00000000:34:00.0?Off?|????????????????????0?|
|?N/A???67C????P0???296W?/?300W?|??78543MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???1??NVIDIA?A800?80G...??Off??|?00000000:35:00.0?Off?|????????????????????0?|
|?N/A???69C????P0???303W?/?300W?|??78577MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???2??NVIDIA?A800?80G...??Off??|?00000000:36:00.0?Off?|????????????????????0?|
|?N/A???70C????P0???300W?/?300W?|??78657MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???3??NVIDIA?A800?80G...??Off??|?00000000:37:00.0?Off?|????????????????????0?|
|?N/A???72C????P0???297W?/?300W?|??78577MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???4??NVIDIA?A800?80G...??Off??|?00000000:9B:00.0?Off?|????????????????????0?|
|?N/A???71C????P0???292W?/?300W?|??78641MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???5??NVIDIA?A800?80G...??Off??|?00000000:9C:00.0?Off?|????????????????????0?|
|?N/A???71C????P0???305W?/?300W?|??78629MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???6??NVIDIA?A800?80G...??Off??|?00000000:9D:00.0?Off?|????????????????????0?|
|?N/A???68C????P0???296W?/?300W?|??78625MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
|???7??NVIDIA?A800?80G...??Off??|?00000000:9E:00.0?Off?|????????????????????0?|
|?N/A???68C????P0???298W?/?300W?|??78799MiB?/?81920MiB?|????100%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
|?Processes:??????????????????????????????????????????????????????????????????|
|??GPU???GI???CI????????PID???Type???Process?name??????????????????GPU?Memory?|
|????????ID???ID???????????????????????????????????????????????????Usage??????|
|=============================================================================|
|????0???N/A??N/A?????33369??????C???...nv-py310-cu117/bin/python????78541MiB?|
|????1???N/A??N/A?????33370??????C???...nv-py310-cu117/bin/python????78575MiB?|
|????2???N/A??N/A?????33371??????C???...nv-py310-cu117/bin/python????78655MiB?|
|????3???N/A??N/A?????33372??????C???...nv-py310-cu117/bin/python????78575MiB?|
|????4???N/A??N/A?????33373??????C???...nv-py310-cu117/bin/python????78639MiB?|
|????5???N/A??N/A?????33374??????C???...nv-py310-cu117/bin/python????78627MiB?|
|????6???N/A??N/A?????33375??????C???...nv-py310-cu117/bin/python????78623MiB?|
|????7???N/A??N/A?????33376??????C???...nv-py310-cu117/bin/python????78797MiB?|
+-----------------------------------------------------------------------------+

模型權(quán)重:

>?tree?-h?/home/guodong.li/output/alpaca-lora-65b-dp
/home/guodong.li/output/alpaca-lora-65b-dp
├──?[?424]??adapter_config.json
├──?[?80M]??adapter_model.bin
└──?[4.0K]??checkpoint-6200
????├──?[160M]??optimizer.pt
????├──?[?80M]??pytorch_model.bin
????├──?[?14K]??rng_state_0.pth
????├──?[?14K]??rng_state_1.pth
????├──?[?14K]??rng_state_2.pth
????├──?[?14K]??rng_state_3.pth
????├──?[?14K]??rng_state_4.pth
????├──?[?14K]??rng_state_5.pth
????├──?[?14K]??rng_state_6.pth
????├──?[?14K]??rng_state_7.pth
????├──?[?557]??scaler.pt
????├──?[?627]??scheduler.pt
????├──?[?80K]??trainer_state.json
????└──?[3.5K]??training_args.bin

1?directory,?16?files

可以看到在A800上面,數(shù)據(jù)并行為8,5萬條數(shù)據(jù),單次epoch大約需要4.5小時左右。

將 LoRA 權(quán)重合并回基礎(chǔ)模型

下面將 LoRA 權(quán)重合并回基礎(chǔ)模型,以便于進行模型推理。具體可參考足夠驚艷,使用Alpaca-Lora基于LLaMA(7B)二十分鐘完成微調(diào),效果比肩斯坦福羊駝一文修改export_hf_checkpoint.py文件。

權(quán)重合并過程:

BASE_MODEL=/data/nfs/guodong.li/pretrain/hf-llama-model/llama-65b?\
>?LORA_MODEL=/home/guodong.li/output/alpaca-lora-65b-dp?\
>?HF_CHECKPOINT=/home/guodong.li/output/hf_65b_ckpt?\
>?python?export_hf_checkpoint.py

===================================BUG?REPORT===================================
Welcome?to?bitsandbytes.?For?bug?reports,?please?submit?your?error?trace?to:?https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/cuda_setup/main.py:136:?UserWarning:?WARNING:?The?following?directories?listed?in?your?path?were?found?to?be?non-existent:?{PosixPath('/opt/rh/devtoolset-9/root/usr/lib/dyninst'),?PosixPath('/opt/rh/devtoolset-7/root/usr/lib/dyninst')}
??warn(msg)
CUDA?SETUP:?CUDA?runtime?path?found:?/usr/local/cuda-11.7/lib64/libcudart.so
CUDA?SETUP:?Highest?compute?capability?among?GPUs?detected:?8.0
CUDA?SETUP:?Detected?CUDA?version?117
CUDA?SETUP:?Loading?binary?/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda117.so...
Loading?checkpoint?shards:?100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?81/81?[01:15<00:00,??1.08it/s]

合并后的權(quán)重文件:

>?tree?-h?hf_65b_ckpt
hf_65b_ckpt
├──?[?580]??config.json
├──?[?137]??generation_config.json
├──?[?537]??pytorch_model-00001-of-00403.bin
├──?[500M]??pytorch_model-00002-of-00403.bin
├──?[256M]??pytorch_model-00003-of-00403.bin
├──?[256M]??pytorch_model-00004-of-00403.bin
├──?[344M]??pytorch_model-00005-of-00403.bin
├──?[344M]??pytorch_model-00006-of-00403.bin
├──?[344M]??pytorch_model-00007-of-00403.bin
...
├──?[344M]??pytorch_model-00400-of-00403.bin
├──?[344M]??pytorch_model-00401-of-00403.bin
├──?[344M]??pytorch_model-00402-of-00403.bin
├──?[500M]??pytorch_model-00403-of-00403.bin
└──?[?65K]??pytorch_model.bin.index.json

0?directories,?406?files

模型推理

接下來使用轉(zhuǎn)換后的模型權(quán)重進行模型推理,具體的模型推理(inference.py)代碼如下所示:

import?sys
from?transformers?import?LlamaForCausalLM,?AutoTokenizer
import?torch

device?=?torch.device("cuda:2")?if?torch.cuda.is_available()?else?torch.device("cpu")


tokenizer_path="/data/nfs/guodong.li/pretrain/hf-llama-model/tokenizer"
model_path?=?"/home/guodong.li/output/hf_65b_ckpt"?#?You?can?modify?the?path?for?storing?the?local?model

model?=??LlamaForCausalLM.from_pretrained(model_path,?torch_dtype=torch.float16,?load_in_8bit=True,?device_map="auto")
tokenizer?=?AutoTokenizer.from_pretrained(tokenizer_path)
print("Human:")
line?=?input()
while?line:
????????inputs?=?'Human:?'?+?line.strip()?+?'\n\nAssistant:'
????????input_ids?=?tokenizer(inputs,?return_tensors="pt").input_ids
????????input_ids?=?input_ids.to(device)
????????outputs?=?model.generate(input_ids,?max_new_tokens=500,?do_sample?=?True,?top_k?=?30,?top_p?=?0.85,?temperature?=?0.5,?repetition_penalty=1.,?eos_token_id=2,?bos_token_id=1,?pad_token_id=0)
????????rets?=?tokenizer.batch_decode(outputs,?skip_special_tokens=True,?clean_up_tokenization_spaces=False)
????????print("Assistant:\n"?+?rets[0].strip().replace(inputs,?""))
????????print("\n------------------------------------------------\nHuman:")
????????line?=?input()

運行推理代碼:

>?python?inference.py

===================================BUG?REPORT===================================
Welcome?to?bitsandbytes.?For?bug?reports,?please?submit?your?error?trace?to:?https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/cuda_setup/main.py:136:?UserWarning:?WARNING:?The?following?directories?listed?in?your?path?were?found?to?be?non-existent:?{PosixPath('/opt/rh/devtoolset-9/root/usr/lib/dyninst'),?PosixPath('/opt/rh/devtoolset-7/root/usr/lib/dyninst')}
??warn(msg)
CUDA?SETUP:?CUDA?runtime?path?found:?/usr/local/cuda-11.7/lib64/libcudart.so
CUDA?SETUP:?Highest?compute?capability?among?GPUs?detected:?8.0
CUDA?SETUP:?Detected?CUDA?version?117
CUDA?SETUP:?Loading?binary?/home/guodong.li/virtual-venv/alpara-lora-venv-py310-cu117/lib/python3.10/site-packages/bitsandbytes-0.37.2-py3.10.egg/bitsandbytes/libbitsandbytes_cuda117.so...
Loading?checkpoint?shards:?100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████|?402/402?[01:48<00:00,??3.72it/s]
Human:
What?are?the?five?characteristics?of?a?good?argument?
Assistant:
?A?good?argument?should?be?clear,?concise,?logical,?supported?by?evidence,?and?respectful?of?the?opposing?view.

------------------------------------------------
Human:
Generate?a?list?of?ten?common?idioms?related?to?animals.
Assistant:
?1.?"Like?a?fish?out?of?water"?2.?"Birds?of?a?feather?flock?together"?3.?"Let?the?cat?out?of?the?bag"?4.?"Herding?cats"?5.?"Barking?up?the?wrong?tree"?6.?"Sly?as?a?fox"?7.?"A?lion's?share"?8.?"A?bird?in?the?hand?is?worth?two?in?the?bush"?9.?"A?wolf?in?sheep's?clothing"?10.?"A?wild?goose?chase".

------------------------------------------------
Human:
Evaluate?the?following?expression:?(6+2)*(2-2).
Assistant:
?10.

------------------------------------------------
Human:
Compute?the?derivative?of?3x^3?+?10x.
Assistant:
?The?derivative?of?3x^3?+?10x?is?9x^2?+?10.

------------------------------------------------
Human:

顯存占用:

+-----------------------------------------------------------------------------+
|?NVIDIA-SMI?515.105.01???Driver?Version:?515.105.01???CUDA?Version:?11.7?????|
|-------------------------------+----------------------+----------------------+
|?GPU??Name????????Persistence-M|?Bus-Id????????Disp.A?|?Volatile?Uncorr.?ECC?|
|?Fan??Temp??Perf??Pwr:Usage/Cap|?????????Memory-Usage?|?GPU-Util??Compute?M.?|
|???????????????????????????????|??????????????????????|???????????????MIG?M.?|
|===============================+======================+======================|
|???0??NVIDIA?A800?80G...??Off??|?00000000:34:00.0?Off?|????????????????????0?|
|?N/A???44C????P0????69W?/?300W?|??66927MiB?/?81920MiB?|??????0%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+
...
+-------------------------------+----------------------+----------------------+
|???7??NVIDIA?A800?80G...??Off??|?00000000:9E:00.0?Off?|????????????????????0?|
|?N/A???47C????P0????71W?/?300W?|???7224MiB?/?81920MiB?|??????0%??????Default?|
|???????????????????????????????|??????????????????????|?????????????Disabled?|
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
|?Processes:??????????????????????????????????????????????????????????????????|
|??GPU???GI???CI????????PID???Type???Process?name??????????????????GPU?Memory?|
|????????ID???ID???????????????????????????????????????????????????Usage??????|
|=============================================================================|
|????0???N/A??N/A?????43499??????C???python??????????????????????????66925MiB?|
|????1???N/A??N/A?????43499??????C???python????????????????????????????949MiB?|
...
|????7???N/A??N/A?????43499??????C???python????????????????????????????949MiB?|
+-----------------------------------------------------------------------------+

可以看到即使使用了FP16加載模型,單卡的顯存占用也高達60多G。如果硬件資源不足,可以考慮使用模型并行推理。具體可參考: tensor_parallel 和 FasterTransformer 這兩個項目,使用模型并行對 LLaMA 進行推理。當然,從提升模型的推理速度以及吞吐量的角度來說,對百億級以上的大模型,也應(yīng)該使用模型并行進行推理。

結(jié)語

本文講述了使用 LoRA 高效微調(diào)技術(shù)對 LLaMA 30B/65B 進行模型訓(xùn)練及推理,希望能夠給你帶來幫助。

參考文檔:文章來源地址http://www.zghlxwxcb.cn/news/detail-474918.html

  • 從0到1復(fù)現(xiàn)斯坦福羊駝(Stanford Alpaca 7B)
  • 足夠驚艷,使用Alpaca-Lora基于LLaMA(7B)二十分鐘完成微調(diào),效果比肩斯坦福羊駝
  • Alpaca-LoRA

到了這里,關(guān)于使用 LoRA 技術(shù)對 LLaMA 65B 大模型進行微調(diào)及推理的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!

本文來自互聯(lián)網(wǎng)用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務(wù),不擁有所有權(quán),不承擔相關(guān)法律責任。如若轉(zhuǎn)載,請注明出處: 如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符,請點擊違法舉報進行投訴反饋,一經(jīng)查實,立即刪除!

領(lǐng)支付寶紅包贊助服務(wù)器費用

相關(guān)文章

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請作者喝杯咖啡吧~博客贊助

支付寶掃一掃領(lǐng)取紅包,優(yōu)惠每天領(lǐng)

二維碼1

領(lǐng)取紅包

二維碼2

領(lǐng)紅包