国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

在Windows或Mac上安裝并運行LLAMA2

2年前作者：茫茫人海一粒沙分類：Toy博客閱讀(19)違法舉報

這篇具有很好參考價值的文章主要介紹了在Windows或Mac上安裝并運行LLAMA2。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

LLAMA2在不同系統(tǒng)上運行的結果

LLAMA2 在windows 上運行的結果

在Windows或Mac上安裝并運行LLAMA2,llama

LLAMA2 在Mac上運行的結果

在Windows或Mac上安裝并運行LLAMA2,llama

安裝Llama2的不同方法

方法一：

?編譯 llama.cpp

克隆 llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git

通過conda 創(chuàng)建或者venv. 下面是通過conda 創(chuàng)建的。

conda create --name llama_test python=3.9
conda activate llama_test

安裝python依賴的包

pip3 install -r requirements.txt

編譯llama.cpp

mac

LLAMA_METAL=1 make

windows , 用powershell 運行 make

下載llama2模型

直接在huggingface里下載量化了的 gguf格式的llama2模型。

https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/tree/main

我下載的是llama-2-7b-chat.Q4_0.gguf

在Windows或Mac上安裝并運行LLAMA2,llama

拷貝llama-2-7b-chat.Q4_0.gguf 到llama.cpp目錄里的models目錄里

運行模型

如果是windows，要用powershell

./main -m ./models/llama-2-7b-chat.Q4_0.gguf --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1.1 -t 8

在Windows或Mac上安裝并運行LLAMA2,llama

方法二：

Meta已將llama2開源，任何人都可以通過在meta ai上申請并接受許可證、提供電子郵件地址來獲取模型。 Meta 將在電子郵件中發(fā)送下載鏈接。

在Windows或Mac上安裝并運行LLAMA2,llama

下載llama2?

獲取download.sh文件，將其存儲在mac上
打開mac終端，執(zhí)行 chmod +x ./download.sh 賦予權限。
運行 ./download.sh 開始下載過程
復制電子郵件中的下載鏈接，粘貼到終端
僅下載13B-chat

安裝系統(tǒng)依賴的東西

必須安裝 Xcode 才能編譯 C++ 項目。如果您沒有，請執(zhí)行以下操作：

xcode-select --install

接下來，安裝用于構建 C++ 項目的依賴項。

brew install pkgconfig cmake

最后，我們安裝 Torch。

如果您沒有安裝python3，請通過以下方式安裝

brew install python@3.11

像這樣創(chuàng)建一個虛擬環(huán)境：

/opt/homebrew/bin/python3.11 -m venv venv

激活 venv。

source venv/bin/activate

安裝 PyTorch:

pip install --pre torch torchvision --extra-index-url https://download.pytorch.org/whl/nightly/cpu

編譯 llama.cpp

克隆 llama.cpp

git clone https://github.com/ggerganov/llama.cpp.git

安裝python依賴包

pip3 install -r requirements.txt

編譯

LLAMA_METAL=1 make

在Windows或Mac上安裝并運行LLAMA2,llama

如果你有兩個arch (x86_64, arm64), 可以用下面指定arm64

arch -arm64 make

在Windows或Mac上安裝并運行LLAMA2,llama

將下載的 13B 移至 models 文件夾下的 llama.cpp 項目。

在Windows或Mac上安裝并運行LLAMA2,llama

將模型轉換為ggml格式
13B和70B是不一樣的。 Convert-pth-to-ggml.py 已棄用，請使用 Convert.py 代替

13B-chat

 python3 convert.py --outfile ./models/llama-2-13b-chat/ggml-model-f16.bin --outtype f16 ./models/llama-2-13b-chat

Quantize 模型:

In order to run these huge LLMs in our small laptops we will need to reconstruct and quantize the model with the following commands, here we will convert the model’s weights from float16 to int4 requiring less memory to be executed and only losing a little bit of quality in the process.

在Windows或Mac上安裝并運行LLAMA2,llama

13B-chat:

./quantize ./models/llama-2-13b-chat/ggml-model-f16.bin ./models/llama-2-13b-chat/ggml-model-q4_0.bin q4_0

運行模型

./main -m ./models/llama-2-13b-chat/ggml-model-q4_0.bin -t 4 -c 2048 -n 2048 --color -i -r '### Question:' -p '### Question:'

您可以使用 -ngl 1 命令行參數(shù)啟用 GPU 推理。任何大于 0 的值都會將計算負載轉移到 GPU。例如：

./main -m ./models/llama-2-13b-chat/ggml-model-q4_0.bin -t 4 -c 2048 -n 2048 --color -i -ngl 1 -r '### Question:' -p '### Question:'

在我的 Mac 上測試時，它比純 cpu 快大約 25%。

其它

ggml格式的llama2

如果你下載的是ggml格式的，要運行下面命令轉換格式

python convert-llama-ggml-to-gguf.py --eps 1e-5 -i ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -o ./models/llama-2-13b-chat.ggmlv3.q4_0.gguf.bin

(llama) C:\Users\Harry\PycharmProjects\llama.cpp>python convert-llama-ggml-to-gguf.py --eps 1e-5 -i ./models/llama-2-13b-chat.ggmlv3.q4_0.bin -o ./models/llama-2-13b-chat.ggmlv3.q4_0.gguf.bin
* Using config: Namespace(input=WindowsPath('models/llama-2-13b-chat.ggmlv3.q4_0.bin'), output=WindowsPath('models/llama-2-13b-chat.ggmlv3.q4_0.gguf.bin'), name=None, desc=None, gqa=1, eps='1e-5', context_length=2048, model_metadata_dir=None, vocab_dir=None, vocabtype='spm')

=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

- Note: If converting LLaMA2, specifying "--eps 1e-5" is required. 70B models also need "--gqa 8".
* Scanning GGML input file
* File format: GGJTv3 with ftype MOSTLY_Q4_0
* GGML model hyperparameters: <Hyperparameters: n_vocab=32000, n_embd=5120, n_mult=256, n_head=40, n_layer=40, n_rot=128, n_ff=13824, ftype=MOSTLY_Q4_0>

=== WARNING === Special tokens may not be converted correctly. Use --model-metadata-dir if possible === WARNING ===

* Preparing to save GGUF file
This gguf file is for Little Endian only
* Adding model parameters and KV items
* Adding 32000 vocab item(s)
* Adding 363 tensor(s)
    gguf: write header
    gguf: write metadata
    gguf: write tensors
* Successful completion. Output saved to: models\llama-2-13b-chat.ggmlv3.q4_0.gguf.bin

參考資料

GitHub - facebookresearch/llama: Inference code for LLaMA models

A comprehensive guide to running Llama 2 locally –?Replicate文章來源地址http://www.zghlxwxcb.cn/news/detail-744705.html

到了這里，關于在Windows或Mac上安裝并運行LLAMA2的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權，不承擔相關法律責任。如若轉載，請注明出處：如若內(nèi)容造成侵權/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領支付寶紅包贊助服務器費用

大模型部署手記（8）LLaMa2+Windows+llama.cpp+英文文本補齊
組織機構：Meta（Facebook）代碼倉：https://github.com/facebookresearch/llama 模型：llama-2-7b 下載：使用download.sh下載硬件環(huán)境：暗影精靈7Plus Windows版本：Windows 11家庭中文版 Insider Preview 22H2 內(nèi)存 32G GPU顯卡：Nvidia GTX 3080 Laptop （16G）下載llama.cpp的代碼倉： git clone https://github.com/ggergan
2024年02月03日
瀏覽(50)
大模型部署手記（11）LLaMa2+Chinese-LLaMA-Plus-2-7B+Windows+llama.cpp+中文對話
組織機構：Meta（Facebook）代碼倉：GitHub - facebookresearch/llama: Inference code for LLaMA models 模型：LIama-2-7b-hf、Chinese-LLaMA-Plus-2-7B ? 下載：使用huggingface.co和百度網(wǎng)盤下載硬件環(huán)境：暗影精靈7Plus Windows版本：Windows 11家庭中文版 Insider Preview 22H2 內(nèi)存 32G GPU顯卡：Nvidia GTX 3080 Laptop （1
2024年02月03日
瀏覽(26)
大模型部署手記（9）LLaMa2+Chinese-LLaMA-Plus-7B+Windows+llama.cpp+中文文本補齊
組織機構：Meta（Facebook）代碼倉：GitHub - facebookresearch/llama: Inference code for LLaMA models 模型：llama-2-7b、Chinese-LLaMA-Plus-7B（chinese_llama_plus_lora_7b） ? 下載：使用download.sh下載硬件環(huán)境：暗影精靈7Plus Windows版本：Windows 11家庭中文版 Insider Preview 22H2 內(nèi)存 32G GPU顯卡：Nvidia GTX 3080 La
2024年02月03日
瀏覽(24)
大模型部署手記（10）LLaMa2+Chinese-LLaMA-Plus-7B+Windows+llama.cpp+中英文對話
組織機構：Meta（Facebook）代碼倉：GitHub - facebookresearch/llama: Inference code for LLaMA models 模型：llama-2-7b、llama-2-7b-chat（后來證明無法實現(xiàn)中文轉換）、Chinese-LLaMA-Plus-7B（chinese_llama_plus_lora_7b） ? 下載：使用download.sh下載硬件環(huán)境：暗影精靈7Plus Windows版本：Windows 11家庭中文版
2024年02月04日
瀏覽(23)
使用GGML和LangChain在CPU上運行量化的llama2
Meta AI 在本周二發(fā)布了最新一代開源大模型 Llama 2。對比于今年 2 月發(fā)布的 Llama 1，訓練所用的 token 翻了一倍，已經(jīng)達到了 2 萬億，對于使用大模型最重要的上下文長度限制，Llama 2 也翻了一倍。在本文，我們將緊跟趨勢介紹如何在本地CPU推理上運行量化版本的開源Llama 2。我
2024年02月16日
瀏覽(22)
大模型部署手記（13）LLaMa2+Chinese-LLaMA-Plus-2-7B+Windows+LangChain+摘要問答
組織機構：Meta（Facebook）代碼倉：GitHub - facebookresearch/llama: Inference code for LLaMA models 模型：chinese-alpaca-2-7b-hf、text2vec-large-chinese 下載：使用百度網(wǎng)盤和huggingface.co下載硬件環(huán)境：暗影精靈7Plus Windows版本：Windows 11家庭中文版 Insider Preview 22H2 內(nèi)存 32G GPU顯卡：Nvidia GTX 3080 Laptop
2024年02月04日
瀏覽(21)
Sealos 國內(nèi)集群正式上線，可一鍵運行 LLama2 中文版大模型！
2023 年 7 月 19 日，MetaAI 宣布開源旗下的 LLama2 大模型，Meta 首席科學家、圖靈獎得主 Yann LeCun 在推特上表示 Meta 此舉可能將改變大模型行業(yè)的競爭格局。一夜之間，大模型格局再次發(fā)生巨變。不同于 LLama，LLama2 免費可商用！ LLama2 的能力在 GPT-3 ~ GPT-3.5 之間，對于關注數(shù)據(jù)隱
2024年02月12日
瀏覽(24)
llama.cpp LLM模型 windows cpu安裝部署；運行LLaMA-7B模型測試
參考： https://www.listera.top/ji-xu-zhe-teng-xia-chinese-llama-alpaca/ https://blog.csdn.net/qq_38238956/article/details/130113599 cmake windows安裝參考：https://blog.csdn.net/weixin_42357472/article/details/131314105 1、下載： 2、編譯 3、測試運行參考： https://zhuanlan.zhihu.com/p/638427280 模型下載： https://huggingface.co/nya
2024年02月15日
瀏覽(17)
LLMs之LLaMA2：LLaMA2的簡介(技術細節(jié))、安裝、使用方法(開源-免費用于研究和商業(yè)用途)之詳細攻略
LLMs之LLaMA-2：LLaMA-2的簡介(技術細節(jié))、安裝、使用方法(開源-免費用于研究和商業(yè)用途)之詳細攻略導讀：2023年7月18日，Meta重磅發(fā)布Llama 2！這是一組預訓練和微調(diào)的大型語言模型（LLM），規(guī)模從70億到700億個參數(shù)不等。Meta微調(diào)的LLM稱為Llama 2-Chat，專為對話使用場景進行了優(yōu)化
2024年02月16日
瀏覽(23)
Windows11下私有化部署大語言模型實戰(zhàn) langchain+llama2
CPU：銳龍5600X 顯卡：GTX3070 內(nèi)存：32G 注：硬件配置僅為博主的配置，不是最低要求配置，也不是推薦配置。該配置下計算速度約為40tokens/s。實測核顯筆記本（i7-1165g7）也能跑，速度3tokens/s。 Windows系統(tǒng)版本：Win11專業(yè)版23H2 Python版本：3.11 Cuda版本：12.3.2 VS版本：VS2022 17.8.3 lan
2024年02月03日
瀏覽(1178)