參考:
GitHub - mayooear/gpt4-pdf-chatbot-langchain: GPT4 & LangChain Chatbot for large PDF docs
1.摘要:
使用新的GPT-4 api為多個大型PDF文件構(gòu)建chatGPT聊天機器人。
使用的技術(shù)棧包括LangChain, Pinecone, Typescript, Openai和Next.js。LangChain是一個框架,可以更容易地構(gòu)建可擴展的AI/LLM大語言模型應(yīng)用程序和聊天機器人。Pinecone是一個矢量存儲,用于存儲嵌入和文本格式的PDF,以便以后檢索類似的文檔。
2.準備工作:
OpenAI API Key GPT-3.5或者GPT-4?openai?
Pinecone API Key/Environment/Index??pinecone
Pinecone Starter(免費)計劃用戶的Index在7天后被刪除。為了防止這種情況,在7天之前向Pinecone發(fā)送API請求重置計數(shù)器。就可以繼續(xù)免費使用了。
3.克隆或下載項目gpt4-pdf-chatbot-langchain
git clone https://github.com/mayooear/gpt4-pdf-chatbot-langchain.git
4.安裝依賴包
使用npm安裝yarn,如果沒有npm,參考安裝?
npm/Node.js介紹及快速安裝 - Linux CentOS_Entropy-Go的博客-CSDN博客
npm install yarn -g
?再使用yarn安裝依賴包
?進入項目根目錄,執(zhí)行命令
yarn install
安裝成功后,可以看到?node_modules 目錄
gpt4-pdf-chatbot-langchain-main$ ls -a
. declarations .eslintrc.json node_modules .prettierrc styles utils yarn.lock
.. docs .gitignore package.json public tailwind.config.cjs venv
components .env .idea pages README.md tsconfig.json visual-guide
config .env.example next.config.js postcss.config.cjs scripts types yarn-error.log
5.環(huán)境配置
將.env.example復(fù)制成.env配置文件
OPENAI_API_KEY=sk-xxx
# Update these with your pinecone details from your dashboard.
# PINECONE_INDEX_NAME is in the indexes tab under "index name" in blue
# PINECONE_ENVIRONMENT is in indexes tab under "Environment". Example: "us-east1-gcp"
PINECONE_API_KEY=xxx
PINECONE_ENVIRONMENT=us-west1-gcp-free
PINECONE_INDEX_NAME=xxx
config/pinecone.ts修改
在config文件夾中,將PINECONE_NAME_SPACE替換為一個namespace,當(dāng)你運行npm run ingest時,你想在這個namespace中存儲嵌入到PINECONE_NAME_SPACE。這個namespace稍后將用于查詢和檢索。
修改聊天機器人的提示詞和OpenAI模型
在utils/makechain.ts中為您自己的用例更改QA_PROMPT。
如果您可以訪問gpt-4 api,請將新OpenAI中的modelName更改為gpt-4。請在此repo之外驗證您是否可以訪問gpt-4 api,否則應(yīng)用程序?qū)o法工作。
import { OpenAI } from 'langchain/llms/openai';
import { PineconeStore } from 'langchain/vectorstores/pinecone';
import { ConversationalRetrievalQAChain } from 'langchain/chains';
const CONDENSE_PROMPT = `Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.
Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:`;
const QA_PROMPT = `You are a helpful AI assistant. Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say you don't know. DO NOT try to make up an answer.
If the question is not related to the context, politely respond that you are tuned to only answer questions that are related to the context.
{context}
Question: {question}
Helpful answer in markdown:`;
export const makeChain = (vectorstore: PineconeStore) => {
const model = new OpenAI({
temperature: 0, // increase temepreature to get more creative answers
modelName: 'gpt-3.5-turbo', //change this to gpt-4 if you have access
});
const chain = ConversationalRetrievalQAChain.fromLLM(
model,
vectorstore.asRetriever(),
{
qaTemplate: QA_PROMPT,
questionGeneratorTemplate: CONDENSE_PROMPT,
returnSourceDocuments: true, //The number of source documents returned is 4 by default
},
);
return chain;
};
6.添加PDF文檔為知識庫
因為會和OpenAI和Pinecone有數(shù)據(jù)交互,建議上傳文檔之前,慎重考慮數(shù)據(jù)隱私和安全。
將1個或多個PDF文檔上傳到 docs 目錄下
執(zhí)行上傳命令
npm run ingest
在Pinecone上檢查是否上傳成功
7.運行知識庫聊天機器人
當(dāng)你驗證了嵌入和內(nèi)容已經(jīng)成功地添加到你的Pinecone中,你可以運行應(yīng)用程序npm run dev來啟動本地開發(fā)環(huán)境,然后在聊天界面中輸入一個問題,進行對話。
執(zhí)行命令:
npm run dev
8.常見問題Troubleshooting
https://github.com/mayooear/gpt4-pdf-chatbot-langchain#troubleshooting
In general, keep an eye out in the?issues
?and?discussions
?section of this repo for solutions.
General errors文章來源:http://www.zghlxwxcb.cn/news/detail-669632.html
- Make sure you're running the latest Node version. Run?
node -v
- Try a different PDF or convert your PDF to text first. It's possible your PDF is corrupted, scanned, or requires OCR to convert to text.
-
Console.log
?the?env
?variables and make sure they are exposed. - Make sure you're using the same versions of LangChain and Pinecone as this repo.
- Check that you've created an?
.env
?file that contains your valid (and working) API keys, environment and index name. - If you change?
modelName
?in?OpenAI
, make sure you have access to the api for the appropriate model. - Make sure you have enough OpenAI credits and a valid card on your billings account.
- Check that you don't have multiple OPENAPI keys in your global environment. If you do, the local?
env
?file from the project will be overwritten by systems?env
?variable. - Try to hard code your API keys into the?
process.env
?variables if there are still issues.
Pinecone errors文章來源地址http://www.zghlxwxcb.cn/news/detail-669632.html
- Make sure your pinecone dashboard?
environment
?and?index
?matches the one in the?pinecone.ts
?and?.env
?files. - Check that you've set the vector dimensions to?
1536
. - Make sure your pinecone namespace is in lowercase.
- Pinecone indexes of users on the Starter(free) plan are deleted after 7 days of inactivity. To prevent this, send an API request to Pinecone to reset the counter before 7 days.
- Retry from scratch with a new Pinecone project, index, and cloned repo.
到了這里,關(guān)于基于GPT-4和LangChain構(gòu)建云端定制化PDF知識庫AI聊天機器人的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!