此篇博客主題:LLAMA模型數(shù)據(jù)、訓(xùn)練時(shí)長(zhǎng)、功耗及碳排放量
LLaMA: Open and Efficient Foundation Language Models
paper https://arxiv.org/pdf/2302.13971v1.pdf
1 訓(xùn)練樣本
Overall, our entire training dataset contains roughly 1.4T tokens after tokenization. For most of our training data, each token is used only once during training, with the exception of the Wikipedia
and Books domains, over which we perform approximately two epochs.
- 模型訓(xùn)練樣本來(lái)源及占比如下圖,經(jīng)數(shù)據(jù)清理去重后剩下1.4Ttokens數(shù)據(jù) (1.4T=1.4e12)
- 數(shù)據(jù)訓(xùn)練次數(shù)見(jiàn)Epochs ,大多數(shù)都只訓(xùn)練一輪,但book,wikipeida等數(shù)據(jù)會(huì)訓(xùn)練兩輪左右(可能數(shù)據(jù)價(jià)值更高)
2 訓(xùn)練時(shí)間
When training a 65B-parameter model, our code processes around 380 tokens/sec/GPU on 2048 A100 GPU with 80GB of RAM. This means that training over our dataset containing 1.4T tokens takes approximately 21 days.
訓(xùn)練65B參數(shù)模型:
GPU數(shù):2048
GPU型號(hào):A100,80G
訓(xùn)練數(shù)據(jù):1.4T
GPU數(shù)據(jù)處理速度:380 tokens/s/GPU
訓(xùn)練時(shí)間:21天 (計(jì)算公式如下)
t
=
1.4
?
1
e
12
/
(
2048
?
24
?
3600
?
380
)
=
21
d
a
y
t=1.4*1e12 /(2048*24*3600*380)=21 day
t=1.4?1e12/(2048?24?3600?380)=21day
3 碳排放量
- 每小時(shí)瓦數(shù)估計(jì)Watt-hour(WH)
W h = G P U ? h ? ( G P U 瓦數(shù) ) ? P U E Wh=GPU-h * (GPU 瓦數(shù)) * PUE Wh=GPU?h?(GPU瓦數(shù))?PUE
PUE表示:電源使用效率
碳排放量公式為
t C O 2 e q = M W H ? 0.385 tCO_2eq=MWH*0.385 tCO2?eq=MWH?0.385
we estimate that we used 2048 A100-80GBfor a period of approximately 5 months to develop our models. This means that developing these models would have cost around 2,638 MWh under our assumptions, and a total emission of 1,015 tCO2eq.
我們使用2048個(gè)A100 80GPU,開(kāi)發(fā)了約5個(gè)月。大約使用了2638Mwh, 碳排放量約為1015tCO2eq文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-598694.html
4 思考
We hope that releasing these models will help to reduce future carbon emission since the training is already done, and some of the models are relatively small and can be run on a single GPU.
我們希望開(kāi)源更多的大模型,再已有的模型基礎(chǔ)上訓(xùn)練,減少重復(fù)開(kāi)發(fā),減少碳排放量。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-598694.html
到了這里,關(guān)于LLaMA(Open and Efficient Foundation Language Models )論文解讀(二)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!