本工作簿演示了 Elasticsearch 的自查詢檢索器 (self-query retriever) 將問題轉(zhuǎn)換為結(jié)構(gòu)化查詢并將結(jié)構(gòu)化查詢應(yīng)用于 Elasticsearch 索引的示例。
在開始之前,我們首先使用 langchain 將文檔分割成塊,然后使用 ElasticsearchStore.from_documents 創(chuàng)建一個向量存儲并將數(shù)據(jù)索引到 elasticsearch。
然后,我們將看到一些示例查詢,展示了由 elasticsearch 驅(qū)動的自查詢檢索器的全部功能。
安裝
如果你還沒有安裝好自己的 Elasticsearch 及 Kibana,請參考文章:
安裝 Elasticsearch 及 Kibana
如果你還沒有安裝好自己的 Elasticsearch 及 Kibana,那么請參考一下的文章來進行安裝:
-
如何在 Linux,MacOS 及 Windows 上進行安裝 Elasticsearch
-
Kibana:如何在 Linux,MacOS 及 Windows 上安裝 Elastic 棧中的 Kibana
在安裝的時候,請選擇 Elastic Stack 8.x?進行安裝。在安裝的時候,我們可以看到如下的安裝信息:
Python 安裝包
我們需要安裝 Python 版本 3.6?及以上版本。我們還需要安裝如下的 Python 安裝包:
python3 -m pip install -qU lark elasticsearch langchain openai
$ pwd
/Users/liuxg/python/elser
$ python3 -m pip install -qU lark elasticsearch langchain openai
$ pip3 list | grep elasticsearch
elasticsearch 8.11.1
rag-elasticsearch 0.0.1 /Users/liuxg/python/rag-elasticsearch/my-app/packages/rag-elasticsearch
在本練習中,我們將使用最新的 Elastic Stack 8.11 來進行展示。
環(huán)境變量
在啟動 Jupyter 之前,我們設(shè)置如下的環(huán)境變量:
export ES_USER="elastic"
export ES_PASSWORD="yarOjyX5CLqTsKVE3v*d"
export ES_ENDPOINT="localhost"
export OPENAI_API_KEY="YOUR_OPEN_AI_KEY"
請在上面修改相應(yīng)的變量的值。特別是你需要輸入自己的 OPENAI_API_KEY。
拷貝 Elasticsearch 證書
我們把 Elasticsearch 的證書拷貝到當前的目錄下:
$ pwd
/Users/liuxg/python/elser
$ cp ~/elastic/elasticsearch-8.11.0/config/certs/http_ca.crt .
overwrite ./http_ca.crt? (y/n [n]) y
$ ls http_ca.crt
http_ca.crt
創(chuàng)建應(yīng)用
導入 python 包
我們在當前的目錄下創(chuàng)建 jupyter notebook:Chatbot Example with Self Query Retriever.ipynb
from langchain.schema import Document
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.vectorstores import ElasticsearchStore
from langchain.llms import OpenAI
from langchain.retrievers.self_query.base import SelfQueryRetriever
from langchain.chains.query_constructor.base import AttributeInfo
創(chuàng)建文檔
接下來,我們將使用 langchain 模式文檔創(chuàng)建包含電影摘要的文檔列表,其中包含每個文檔的 page_content 和元數(shù)據(jù)。
docs = [
Document(
page_content="A bunch of scientists bring back dinosaurs and mayhem breaks loose",
metadata={"year": 1993, "rating": 7.7, "genre": "science fiction", "director": "Steven Spielberg", "title": "Jurassic Park"},
),
Document(
page_content="Leo DiCaprio gets lost in a dream within a dream within a dream within a ...",
metadata={"year": 2010, "director": "Christopher Nolan", "rating": 8.2, "title": "Inception"},
),
Document(
page_content="A psychologist / detective gets lost in a series of dreams within dreams within dreams and Inception reused the idea",
metadata={"year": 2006, "director": "Satoshi Kon", "rating": 8.6, "title": "Paprika"},
),
Document(
page_content="A bunch of normal-sized women are supremely wholesome and some men pine after them",
metadata={"year": 2019, "director": "Greta Gerwig", "rating": 8.3, "title": "Little Women"},
),
Document(
page_content="Toys come alive and have a blast doing so",
metadata={"year": 1995, "genre": "animated", "director": "John Lasseter", "rating": 8.3, "title": "Toy Story"},
),
Document(
page_content="Three men walk into the Zone, three men walk out of the Zone",
metadata={
"year": 1979,
"rating": 9.9,
"director": "Andrei Tarkovsky",
"genre": "science fiction",
"rating": 9.9,
"title": "Stalker",
},
),
]
連接到 Elasticsearch
我們將使用我們本地構(gòu)建的 Elasticsearch 集群進行連接。我們可以參考之前的文章 “Elasticsearch:使用 Open AI 和 Langchain 的 RAG - Retrieval Augmented Generation (三)”。
from dotenv import load_dotenv
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import ElasticKnnSearch
from langchain.text_splitter import CharacterTextSplitter
from urllib.request import urlopen
import os, json
load_dotenv()
openai_api_key=os.getenv('OPENAI_API_KEY')
elastic_user=os.getenv('ES_USER')
elastic_password=os.getenv('ES_PASSWORD')
elastic_endpoint=os.getenv("ES_ENDPOINT")
elastic_index_name='elastic-knn-search'
from elasticsearch import Elasticsearch
url = f"https://{elastic_user}:{elastic_password}@{elastic_endpoint}:9200"
connection = Elasticsearch(url, ca_certs = "./http_ca.crt", verify_certs = True)
print(connection.info())
embeddings = OpenAIEmbeddings(openai_api_key=openai_api_key)
dims=1536
es = ElasticsearchStore.from_documents(
docs,
embedding = embeddings,
es_url = url,
es_connection = connection,
index_name = elastic_index_name,
es_user = elastic_user,
es_password = elastic_password)
設(shè)置查詢檢索器
接下來,我們將通過提供有關(guān)文檔屬性的一些信息和有關(guān)文檔的簡短描述來實例化自查詢檢索器。
然后我們將使用 SelfQueryRetriever.from_llm 實例化檢索器 (retriever)
metadata_field_info = [
AttributeInfo(
name="genre",
description="The genre of the movie. Can be either 'science fiction' or 'animated'.",
type="string or list[string]",
),
AttributeInfo(
name="year",
description="The year the movie was released",
type="integer",
),
AttributeInfo(
name="director",
description="The name of the movie director",
type="string",
),
AttributeInfo(
name="rating", description="A 1-10 rating for the movie", type="float"
),
]
document_content_description = "Brief summary of a movie"
# Set up openAI llm with sampling temperature 0
llm = OpenAI(temperature=0, openai_api_key=openai_api_key)
# instantiate retriever
retriever = SelfQueryRetriever.from_llm(
llm, es, document_content_description, metadata_field_info, verbose=True
)
使用自查詢檢索器回答問題
現(xiàn)在我們將演示如何使用 RAG 的自查詢檢索器。
from langchain.chat_models import ChatOpenAI
from langchain.schema.runnable import RunnableParallel, RunnablePassthrough
from langchain.prompts import ChatPromptTemplate, PromptTemplate
from langchain.schema import format_document
LLM_CONTEXT_PROMPT = ChatPromptTemplate.from_template("""
Use the following context movies that matched the user question. Use the movies below only to answer the user's question.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
----
{context}
----
Question: {question}
Answer:
""")
DOCUMENT_PROMPT = PromptTemplate.from_template("""
---
title: {title}
year: {year}
director: {director}
---
""")
def _combine_documents(
docs, document_prompt=DOCUMENT_PROMPT, document_separator="\n\n"
):
doc_strings = [format_document(doc, document_prompt) for doc in docs]
return document_separator.join(doc_strings)
_context = RunnableParallel(
context=retriever | _combine_documents,
question=RunnablePassthrough(),
)
chain = (_context | LLM_CONTEXT_PROMPT | llm)
chain.invoke("What movies are about dreams and it was released after the year 2009 but before the year 2011?")
文章來源:http://www.zghlxwxcb.cn/news/detail-780593.html
上面的代碼可以在地址:https://github.com/liu-xiao-guo/semantic_search_es/blob/main/Chatbot%20Example%20with%20Self%20Query%20Retriever.ipynb下載。文章來源地址http://www.zghlxwxcb.cn/news/detail-780593.html
到了這里,關(guān)于Elasticsearch:帶有自查詢檢索器的聊天機器人示例的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!