參考:
本項目?https://github.com/PromtEngineer/localGPT
模型?https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML
云端知識庫項目:基于GPT-4和LangChain構(gòu)建云端定制化PDF知識庫AI聊天機器人_Entropy-Go的博客-CSDN博客?
1. 摘要
????????相比OpenAI的LLM ChatGPT模型必須網(wǎng)絡連接并通過API key云端調(diào)用模型,擔心數(shù)據(jù)隱私安全。基于Llama2和LangChain構(gòu)建本地化定制化知識庫AI聊天機器人,是將訓練好的LLM大語言模型本地化部署,在沒有網(wǎng)絡連接的情況下對你的文件提問。100%私有化本地化部署,任何時候都不會有數(shù)據(jù)離開您的運行環(huán)境。你可以在沒有網(wǎng)絡連接的情況下獲取文件和提問!????????
????????介紹一款尖端應用,使用戶能夠在沒有互聯(lián)網(wǎng)連接的情況下利用語言模型的功能。這款先進工具作為一個不可或缺的資源,幫助用戶在超越傳統(tǒng)語言模型工具(如ChatGPT)的限制之外獲取信息。
????????這個應用的一個關鍵優(yōu)勢在于數(shù)據(jù)控制的保留。當處理需要保持在組織內(nèi)部或具有最高機密性的個人文件時,這個功能尤為重要,消除了通過第三方渠道傳輸信息的需求。
????????將個人文件無縫集成到系統(tǒng)中非常簡單,確保用戶體驗流暢。無論是文本、PDF、CSV還是Excel文件,用戶都可以方便地提供所需查詢的信息。該應用程序快速處理這些文檔,有效地創(chuàng)建了一個全面的數(shù)據(jù)庫供模型利用,實現(xiàn)準確而深入的回答。
????????這種方法的一個顯著優(yōu)勢在于其高效的資源利用。與替代方法中資源密集型的重新訓練過程不同,這個應用程序中的文檔攝取要求更少的計算資源。這種效率優(yōu)化可以實現(xiàn)簡化的用戶體驗,節(jié)省時間和計算資源。
????????體驗這個技術奇跡的無與倫比的能力,使用戶能夠在離線狀態(tài)下充分發(fā)揮語言模型的潛力。迎接信息獲取的新時代,提高生產(chǎn)力,拓展可能性。擁抱這個強大的工具,釋放您的數(shù)據(jù)的真正潛力。
2. 準備工作
2.1 Meta's Llama 2 7b Chat GGML
These files are GGML format model files for?Meta's Llama 2 7b Chat.
GGML files are for CPU + GPU inference using?llama.cpp?and libraries and UIs which support this format
2.2 安裝Conda
CentOS 上快速安裝包管理工具Conda_Entropy-Go的博客-CSDN博客
2.3 升級gcc
CentOS gcc介紹及快速升級_Entropy-Go的博客-CSDN博客
3. 克隆或下載項目localGPT
git clone https://github.com/PromtEngineer/localGPT.git
4. 安裝依賴包
4.1 Conda安裝并激活
conda create -n localGPT
conda activate localGPT
4.2 安裝依賴包
如果Conda環(huán)境變量正常設置,直接pip install
pip install -r requirements.txt
否則會使用系統(tǒng)自帶的python,可以使用Conda的安裝的絕對路徑執(zhí)行,后續(xù)都必須使用Conda的python
whereis conda
conda: /root/miniconda3/bin/conda /root/miniconda3/condabin/conda
/root/miniconda3/bin/pip install -r requirements.txt
安裝時如遇下面問題,參考2.3 gcc升級,建議升級至gcc 11
ERROR: Could not build wheels for llama-cpp-python, hnswlib, lxml, which is required to install pyproject.toml-based project
5. 添加文檔為知識庫
5.1 文檔目錄以及模板文檔
可以替換成需要的文檔
~localGPT/SOURCE_DOCUMENTS/constitution.pdf
注入前驗證下help,如前面提到,建議直接使用Conda絕對路徑的python
/root/miniconda3/bin/python ingest.py --help
Usage: ingest.py [OPTIONS]
Options:
--device_type [cpu|cuda|ipu|xpu|mkldnn|opengl|opencl|ideep|hip|ve|fpga|ort|xla|lazy|vulkan|mps|meta|hpu|mtia]
Device to run on. (Default is cuda)
--help Show this message and exit.
5.2 開始注入文檔
默認使用cuda/GPU
/root/miniconda3/bin/python ingest.py
可以指定CPU
/root/miniconda3/bin/python ingest.py --device_type cpu
首次注入時,會下載對應的矢量數(shù)據(jù)DB,矢量數(shù)據(jù)DB會存放到??/root/localGPT/DB
首次注入過程
/root/miniconda3/bin/python ingest.py
2023-08-18 09:36:55,389 - INFO - ingest.py:122 - Loading documents from /root/localGPT/SOURCE_DOCUMENTS
all files: ['constitution.pdf']
2023-08-18 09:36:55,398 - INFO - ingest.py:34 - Loading document batch
2023-08-18 09:36:56,818 - INFO - ingest.py:131 - Loaded 1 documents from /root/localGPT/SOURCE_DOCUMENTS
2023-08-18 09:36:56,818 - INFO - ingest.py:132 - Split into 72 chunks of text
2023-08-18 09:36:57,994 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
Downloading (…)c7233/.gitattributes: 100%|███████████████████████████████████████████████████████████████████████████| 1.48k/1.48k [00:00<00:00, 4.13MB/s]
Downloading (…)_Pooling/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 270/270 [00:00<00:00, 915kB/s]
Downloading (…)/2_Dense/config.json: 100%|████████████████████████████████████████████████████████████████████████████████| 116/116 [00:00<00:00, 380kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████| 3.15M/3.15M [00:01<00:00, 2.99MB/s]
Downloading (…)9fb15c7233/README.md: 100%|████████████████████████████████████████████████████████████████████████████| 66.3k/66.3k [00:00<00:00, 359kB/s]
Downloading (…)b15c7233/config.json: 100%|███████████████████████████████████████████████████████████████████████████| 1.53k/1.53k [00:00<00:00, 5.70MB/s]
Downloading (…)ce_transformers.json: 100%|████████████████████████████████████████████████████████████████████████████████| 122/122 [00:00<00:00, 485kB/s]
Downloading pytorch_model.bin: 100%|█████████████████████████████████████████████████████████████████████████████████| 1.34G/1.34G [03:15<00:00, 6.86MB/s]
Downloading (…)nce_bert_config.json: 100%|██████████████████████████████████████████████████████████████████████████████| 53.0/53.0 [00:00<00:00, 109kB/s]
Downloading (…)cial_tokens_map.json: 100%|███████████████████████████████████████████████████████████████████████████| 2.20k/2.20k [00:00<00:00, 8.96MB/s]
Downloading spiece.model: 100%|████████████████████████████████████████████████████████████████████████████████████████| 792k/792k [00:00<00:00, 3.46MB/s]
Downloading (…)c7233/tokenizer.json: 100%|███████████████████████████████████████████████████████████████████████████| 2.42M/2.42M [00:00<00:00, 3.01MB/s]
Downloading (…)okenizer_config.json: 100%|███████████████████████████████████████████████████████████████████████████| 2.41k/2.41k [00:00<00:00, 9.75MB/s]
Downloading (…)15c7233/modules.json: 100%|███████████████████████████████████████████████████████████████████████████████| 461/461 [00:00<00:00, 1.92MB/s]
load INSTRUCTOR_Transformer
2023-08-18 09:40:26,658 - INFO - instantiator.py:21 - Created a temporary directory at /tmp/tmp47gnnhwi
2023-08-18 09:40:26,658 - INFO - instantiator.py:76 - Writing /tmp/tmp47gnnhwi/_remote_module_non_scriptable.py
max_seq_length ?512
2023-08-18 09:40:30,076 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-08-18 09:40:30,248 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
2023-08-18 09:40:30,252 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-08-18 09:40:30,257 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-08-18 09:40:30,295 - INFO - duckdb.py:454 - No existing DB found in /root/localGPT/DB, skipping load
2023-08-18 09:40:30,295 - INFO - duckdb.py:466 - No existing DB found in /root/localGPT/DB, skipping load
2023-08-18 09:40:32,800 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: /root/localGPT/DB
2023-08-18 09:40:32,813 - INFO - duckdb.py:414 - Persisting DB to disk, putting it in the save folder: /root/localGPT/DB
項目文件列表
ls
ACKNOWLEDGEMENT.md CONTRIBUTING.md ingest.py localGPT_UI.py README.md run_localGPT.py
constants.py DB LICENSE __pycache__ requirements.txt SOURCE_DOCUMENTS
constitution.pdf Dockerfile localGPTUI pyproject.toml run_localGPT_API.py
6. 運行知識庫AI聊天機器人
現(xiàn)在可以和你的本地化知識庫開始對話聊天了!
6.1 命令行方式運行提問
?首次運行時,會下載對應的默認模型 ~/localGPT/constants.py?
# model link: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML
MODEL_ID = "TheBloke/Llama-2-7B-Chat-GGML"
MODEL_BASENAME = "llama-2-7b-chat.ggmlv3.q4_0.bin"
模型會下載到?/root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML
直接運行
/root/miniconda3/bin/python run_localGPT.py
對話輸入
支持英文,中文需要加utf-8進行處理
Enter a query:
對話記錄
/root/miniconda3/bin/python run_localGPT.py
2023-08-18 09:43:02,433 - INFO - run_localGPT.py:180 - Running on: cuda
2023-08-18 09:43:02,433 - INFO - run_localGPT.py:181 - Display Source Documents set to: False
2023-08-18 09:43:02,676 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length ?512
2023-08-18 09:43:05,301 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-08-18 09:43:05,317 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
2023-08-18 09:43:05,328 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-08-18 09:43:05,336 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-08-18 09:43:05,402 - INFO - duckdb.py:460 - loaded in 72 embeddings
2023-08-18 09:43:05,405 - INFO - duckdb.py:472 - loaded in 1 collections
2023-08-18 09:43:05,406 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-08-18 09:43:05,406 - INFO - run_localGPT.py:45 - Loading Model: TheBloke/Llama-2-7B-Chat-GGML, on: cuda
2023-08-18 09:43:05,406 - INFO - run_localGPT.py:46 - This action can take a few minutes!
2023-08-18 09:43:05,406 - INFO - run_localGPT.py:50 - Using Llamacpp for GGML quantized models
Downloading (…)chat.ggmlv3.q4_0.bin: 100%|███████████████████████████████████████████████████████████████████████████| 3.79G/3.79G [09:53<00:00, 6.39MB/s]
llama.cpp: loading model from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML/snapshots/b616819cd4777514e3a2d9b8be69824aca8f5daf/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format ? ? = ggjt v3 (latest)
llama_model_load_internal: n_vocab ? ?= 32000
llama_model_load_internal: n_ctx ? ? ?= 2048
llama_model_load_internal: n_embd ? ? = 4096
llama_model_load_internal: n_mult ? ? = 256
llama_model_load_internal: n_head ? ? = 32
llama_model_load_internal: n_layer ? ?= 32
llama_model_load_internal: n_rot ? ? ?= 128
llama_model_load_internal: ftype ? ? ?= 2 (mostly Q4_0)
llama_model_load_internal: n_ff ? ? ? = 11008
llama_model_load_internal: n_parts ? ?= 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = ? ?0.07 MB
llama_model_load_internal: mem required ?= 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size ?= 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |Enter a query:
或者添加參數(shù)--show_sources,回答時顯示引用章節(jié)信息
/root/miniconda3/bin/python run_localGPT.py --show_sources
對話記錄:
/root/miniconda3/bin/python run_localGPT.py --show_sources
2023-08-18 10:03:55,466 - INFO - run_localGPT.py:180 - Running on: cuda
2023-08-18 10:03:55,466 - INFO - run_localGPT.py:181 - Display Source Documents set to: True
2023-08-18 10:03:55,708 - INFO - SentenceTransformer.py:66 - Load pretrained SentenceTransformer: hkunlp/instructor-large
load INSTRUCTOR_Transformer
max_seq_length ?512
2023-08-18 10:03:58,302 - INFO - __init__.py:88 - Running Chroma using direct local API.
2023-08-18 10:03:58,307 - WARNING - __init__.py:43 - Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
2023-08-18 10:03:58,312 - INFO - ctypes.py:22 - Successfully imported ClickHouse Connect C data optimizations
2023-08-18 10:03:58,318 - INFO - json_impl.py:45 - Using python library for writing JSON byte strings
2023-08-18 10:03:58,372 - INFO - duckdb.py:460 - loaded in 72 embeddings
2023-08-18 10:03:58,373 - INFO - duckdb.py:472 - loaded in 1 collections
2023-08-18 10:03:58,373 - INFO - duckdb.py:89 - collection with name langchain already exists, returning existing collection
2023-08-18 10:03:58,374 - INFO - run_localGPT.py:45 - Loading Model: TheBloke/Llama-2-7B-Chat-GGML, on: cuda
2023-08-18 10:03:58,374 - INFO - run_localGPT.py:46 - This action can take a few minutes!
2023-08-18 10:03:58,374 - INFO - run_localGPT.py:50 - Using Llamacpp for GGML quantized models
llama.cpp: loading model from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML/snapshots/b616819cd4777514e3a2d9b8be69824aca8f5daf/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format ? ? = ggjt v3 (latest)
llama_model_load_internal: n_vocab ? ?= 32000
llama_model_load_internal: n_ctx ? ? ?= 2048
llama_model_load_internal: n_embd ? ? = 4096
llama_model_load_internal: n_mult ? ? = 256
llama_model_load_internal: n_head ? ? = 32
llama_model_load_internal: n_layer ? ?= 32
llama_model_load_internal: n_rot ? ? ?= 128
llama_model_load_internal: ftype ? ? ?= 2 (mostly Q4_0)
llama_model_load_internal: n_ff ? ? ? = 11008
llama_model_load_internal: n_parts ? ?= 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = ? ?0.07 MB
llama_model_load_internal: mem required ?= 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size ?= 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |Enter a query: how many times could president act, and how many years as max?
llama_print_timings: ? ? ? ?load time = 19737.32 ms
llama_print_timings: ? ? ?sample time = ? 101.14 ms / ? 169 runs ? ( ? ?0.60 ms per token, ?1671.02 tokens per second)
llama_print_timings: prompt eval time = 19736.91 ms / ? 925 tokens ( ? 21.34 ms per token, ? ?46.87 tokens per second)
llama_print_timings: ? ? ? ?eval time = 36669.35 ms / ? 168 runs ? ( ?218.27 ms per token, ? ? 4.58 tokens per second)
llama_print_timings: ? ? ? total time = 56849.80 ms
> Question:
how many times could president act, and how many years as max?> Answer:
?The answer to this question can be found in Amendment XXII and Amendment XXIII of the US Constitution. According to these amendments, a person cannot be elected President more than twice, and no person can hold the office of President for more than two years of a term to which someone else was elected President. However, if the President is unable to discharge their powers and duties due to incapacity, the Vice President will continue to act as President until Congress determines the issue.
In summary, a person can be elected President at most twice, and they cannot hold the office for more than two years of a term to which someone else was elected President. If the President becomes unable to discharge their powers and duties, the Vice President will continue to act as President until Congress makes a determination.
----------------------------------SOURCE DOCUMENTS---------------------------> /root/localGPT/SOURCE_DOCUMENTS/constitution.pdf:
Amendment ?XXII.Amendment ?XXIII.
Passed by Congress March 21, 1947. Ratified February 27,
Passed by Congress June 16, 1960. Ratified March 29, 1961.
951.
SECTION 1
...
SECTION 2
....
----------------------------------SOURCE DOCUMENTS---------------------------
Enter a query: exit
6.2 Web UI方式運行提問
6.2.1 啟動服務器端API
可以使用Web UI方式運行,啟動服務器端API在5110端口上進行監(jiān)聽服務
http://127.0.0.1:5110
/root/miniconda3/bin/python run_localGPT_API.py
如果執(zhí)行過程遇到下面問題,還是代碼中的python沒有使用Conda PATH下面的python導致的。
/root/miniconda3/bin/python run_localGPT_API.py
load INSTRUCTOR_Transformer
max_seq_length ?512
The directory does not exist
run_langest_commands ['python', 'ingest.py']
Traceback (most recent call last):
? File "/root/localGPT/run_localGPT_API.py", line 56, in <module>
? ? raise FileNotFoundError(
FileNotFoundError: No files were found inside SOURCE_DOCUMENTS, please put a starter file inside before starting the API!
可以修改~/localGPT/run_localGPT_API.py中的python為Conda下的路徑
run_langest_commands = ["python", "ingest.py"]
修改為
run_langest_commands = ["/root/miniconda3/bin/python", "ingest.py"]
運行過程
看到?INFO:werkzeug:? 表示啟動成功,窗口可以保留座位debug用途
/root/miniconda3/bin/python run_localGPT_API.py
load INSTRUCTOR_Transformer
max_seq_length ?512
WARNING:chromadb:Using embedded DuckDB with persistence: data will be stored in: /root/localGPT/DB
llama.cpp: loading model from /root/.cache/huggingface/hub/models--TheBloke--Llama-2-7B-Chat-GGML/snapshots/b616819cd4777514e3a2d9b8be69824aca8f5daf/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format ? ? = ggjt v3 (latest)
llama_model_load_internal: n_vocab ? ?= 32000
llama_model_load_internal: n_ctx ? ? ?= 2048
llama_model_load_internal: n_embd ? ? = 4096
llama_model_load_internal: n_mult ? ? = 256
llama_model_load_internal: n_head ? ? = 32
llama_model_load_internal: n_layer ? ?= 32
llama_model_load_internal: n_rot ? ? ?= 128
llama_model_load_internal: ftype ? ? ?= 2 (mostly Q4_0)
llama_model_load_internal: n_ff ? ? ? = 11008
llama_model_load_internal: n_parts ? ?= 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = ? ?0.07 MB
llama_model_load_internal: mem required ?= 5407.71 MB (+ 1026.00 MB per state)
llama_new_context_with_model: kv self size ?= 1024.00 MB
AVX = 1 | AVX2 = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
?* Serving Flask app 'run_localGPT_API'
?* Debug mode: on
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
?* Running on http://127.0.0.1:5110
INFO:werkzeug:Press CTRL+C to quit
INFO:werkzeug: * Restarting with watchdog (inotify)
6.2.2 啟動服務器端UI
重新打開一個新的命令行終端,運行~/localGPT/localGPTUI/localGPTUI.py,啟動服務器端UI在5111端口上進行監(jiān)聽服務
http://127.0.0.1:5111
/root/miniconda3/bin/python localGPTUI.py
如需局域網(wǎng)訪問,修改localGPTUI.py,127.0.0.1 -> 0.0.0.0
parser.add_argument("--host", type=str, default="0.0.0.0",
help="Host to run the UI on. Defaults to 127.0.0.1. "
"Set to 0.0.0.0 to make the UI externally "
"accessible from other devices.")
運行記錄
/root/miniconda3/bin/python localGPTUI.py?
?* Serving Flask app 'localGPTUI'
?* Debug mode: on
WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
?* Running on all addresses (0.0.0.0)
?* Running on http://127.0.0.1:5111
?* Running on http://IP:5111
端口使用情況
netstat -nltp | grep 511
tcp 0 0 127.0.0.1:5110 0.0.0.0:* LISTEN 57479/python
tcp 0 0 0.0.0.0:5111 0.0.0.0:* LISTEN 21718/python
6.2.3 瀏覽器訪問Web UI
本機:?http://127.0.0.1:5111?
局域網(wǎng):?http://IP:5111
網(wǎng)頁端可以進行自由對話,支持中文輸入。
使用截圖
6.3 更換本地文檔為知識庫
6.3.1 命令行方式
直接將文檔添加到 ~/localGPT/SOURCE_DOCUMENTS/
會自動觸發(fā)更新適量數(shù)據(jù)庫,等更新好之后,就可以正常進行提問對話。
6.3.2 Web UI方式
上傳文件
1.?要上傳文檔以供應用程序攝取作為其新的知識庫,請單擊upload按鈕。
2.?選擇要用作新知識庫的文檔進行對話。
3.然后,系統(tǒng)會提示您選擇將文檔添加到知識庫,用您剛剛選擇的文檔重置知識庫,或者取消上傳。
4.?當文檔被輸入到矢量數(shù)據(jù)庫中作為新的知識庫時,會有很短的等待時間。?
正在注入中文文檔到知識庫
正在生成回復
結(jié)果返回
7.常見問題Troubleshooting
7.1 中文文檔注入
修改run_localGPT_API.py
max_ctx_size = 4096
修改ingest.py
text_splitter = RecursiveCharacterTextSplitter(chunk_size=800, chunk_overlap=200)
7.2 網(wǎng)頁打開后,問題無回復,response.status_code = 504, 304
如果環(huán)境使用了代理,在運行服務器端UI前,先去掉代理后再運行
unset http_proxy
unset https_proxy
unset ftp_proxy
/root/miniconda3/bin/python localGPTUI.py
7.3 locaGPT如何工作的
Selecting the right local models and the power of?LangChain
?you can run the entire pipeline locally, without any data leaving your environment, and with reasonable performance.
-
ingest.py
?uses?LangChain
?tools to parse the document and create embeddings locally using?InstructorEmbeddings
. It then stores the result in a local vector database using?Chroma
?vector store. -
run_localGPT.py
?uses a local LLM to understand questions and create answers. The context for the answers is extracted from the local vector store using a similarity search to locate the right piece of context from the docs. - You can replace this local LLM with any other LLM from the HuggingFace. Make sure whatever LLM you select is in the HF format.
7.4 怎么選擇不同的LLM大語言模型
The following will provide instructions on how you can select a different LLM model to create your response:
-
Open up?
constants.py
?in the editor of your choice. -
Change the?
MODEL_ID
?and?MODEL_BASENAME
. If you are using a quantized model (GGML
,?GPTQ
), you will need to provide?MODEL_BASENAME
. For unquatized models, set?MODEL_BASENAME
?to?NONE
-
There are a number of example models from HuggingFace that have already been tested to be run with the original trained model (ending with HF or have a .bin in its "Files and versions"), and quantized models (ending with GPTQ or have a .no-act-order or .safetensors in its "Files and versions").
-
For models that end with HF or have a .bin inside its "Files and versions" on its HuggingFace page.
- Make sure you have a model_id selected. For example ->?
MODEL_ID = "TheBloke/guanaco-7B-HF"
- If you go to its HuggingFace?repo?and go to "Files and versions" you will notice model files that end with a .bin extension.
- Any model files that contain .bin extensions will be run with the following code where the?
# load the LLM for generating Natural Language responses
?comment is found. MODEL_ID = "TheBloke/guanaco-7B-HF"
- Make sure you have a model_id selected. For example ->?
-
For models that contain GPTQ in its name and or have a .no-act-order or .safetensors extension inside its "Files and versions on its HuggingFace page.
-
Make sure you have a model_id selected. For example -> model_id =?
"TheBloke/wizardLM-7B-GPTQ"
-
You will also need its model basename file selected. For example ->?
model_basename = "wizardLM-7B-GPTQ-4bit.compat.no-act-order.safetensors"
-
If you go to its HuggingFace?repo?and go to "Files and versions" you will notice a model file that ends with a .safetensors extension.
-
Any model files that contain no-act-order or .safetensors extensions will be run with the following code where the?
# load the LLM for generating Natural Language responses
?comment is found. -
MODEL_ID = "TheBloke/WizardLM-7B-uncensored-GPTQ"
MODEL_BASENAME = "WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order.safetensors"
-
-
Comment out all other instances of?
MODEL_ID="other model names"
,?MODEL_BASENAME=other base model names
, and?llm = load_model(args*)
文章來源:http://www.zghlxwxcb.cn/news/detail-712783.html
7.5 更多問題參考
Issues · PromtEngineer/localGPT · GitHub文章來源地址http://www.zghlxwxcb.cn/news/detail-712783.html
到了這里,關于基于Llama2和LangChain構(gòu)建本地化定制化知識庫AI聊天機器人的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!