国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<small id="thl2c"></small>

<dd id="thl2c"><progress id="thl2c"></progress></dd>

LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve

2年前作者：AI架構(gòu)師易筋分類：Toy博客閱讀(30)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

接著前面的Langchain，繼續(xù)實(shí)現(xiàn)讀取YouTube的視頻腳本來問答Indexes for information retrieve

LangChain 實(shí)現(xiàn)給動(dòng)物取名字，
LangChain 2模塊化prompt template并用streamlit生成網(wǎng)站實(shí)現(xiàn)給動(dòng)物取名字
LangChain 3使用Agent訪問Wikipedia和llm-math計(jì)算狗的平均年齡

LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve,LLM-Large Language Models,langchain,chatgpt,LLM,faiss,prompt

1. 安裝youtube-transcript-api

pip install youtube-transcript-api

pip install faiss-cpu

pip install tiktoken

引用向量數(shù)據(jù)庫Faiss
LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve,LLM-Large Language Models,langchain,chatgpt,LLM,faiss,prompt

2. 編寫讀取視頻字幕并存入向量數(shù)據(jù)庫Faiss，文件langchain_helper.py

# 從langchain包和其他庫中導(dǎo)入必要的模塊
from langchain.document_loaders import YoutubeLoader  # 導(dǎo)入YoutubeLoader，用于加載YouTube視頻數(shù)據(jù)
from langchain.text_splitter import RecursiveCharacterTextSplitter  # 導(dǎo)入文本分割器，用于處理文檔
from langchain.embeddings.openai import OpenAIEmbeddings  # 導(dǎo)入OpenAIEmbeddings，用于生成嵌入向量
from langchain.vectorstores import FAISS  # 導(dǎo)入FAISS，用于大數(shù)據(jù)集中高效的相似性搜索
from langchain.llms import OpenAI  # 導(dǎo)入OpenAI，用于語言模型功能
from langchain import PromptTemplate  # 導(dǎo)入PromptTemplate，用于模板化提示
from langchain.chains import LLMChain  # 導(dǎo)入LLMChain，用于創(chuàng)建語言模型鏈
from dotenv import load_dotenv  # 導(dǎo)入load_dotenv，用于管理環(huán)境變量

load_dotenv()  # 從.env文件加載環(huán)境變量

embedding = OpenAIEmbeddings()  # 初始化OpenAI嵌入向量，用于生成文檔嵌入向量

# YouTube視頻的URL
video_url = "https://youtu.be/-Osca2Zax4Y?si=iy0iePxzUy_bUayO"

def create_vector_db_from_youtube_url(video_url: str) -> FAISS:
    # 加載YouTube視頻字幕
    loader = YoutubeLoader.from_youtube_url(video_url)
    transcript = loader.load()
    
    # 將字幕分割成較小的片段
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = text_splitter.split_documents(transcript)
    
    # 從文檔片段創(chuàng)建FAISS數(shù)據(jù)庫
    db = FAISS.from_documents(docs, embedding)
    return db

# 示例：從給定YouTube URL創(chuàng)建向量數(shù)據(jù)庫
print(create_vector_db_from_youtube_url(video_url))

zgpeaces-MBP at ~/Workspace/LLM/langchain-llm-app ±(feature/infoRetrievel) ? ? python langchain_helper.py
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/langchain/__init__.py:39: UserWarning: Importing PromptTemplate from langchain root module is no longer supported.
  warnings.warn(
<langchain.vectorstores.faiss.FAISS object at 0x11b1e96f0>

LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve,LLM-Large Language Models,langchain,chatgpt,LLM,faiss,prompt

3. 根據(jù)向量數(shù)據(jù)庫的信息查詢

查看OpenAI model
LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve,LLM-Large Language Models,langchain,chatgpt,LLM,faiss,prompt

3.1 添加查詢方法

# 從langchain包和其他庫中導(dǎo)入必要的模塊
from langchain.document_loaders import YoutubeLoader  # 導(dǎo)入YoutubeLoader，用于從YouTube視頻加載數(shù)據(jù)
from langchain.text_splitter import RecursiveCharacterTextSplitter  # 導(dǎo)入用于處理長文檔的文本分割器
from langchain.embeddings.openai import OpenAIEmbeddings  # 導(dǎo)入OpenAIEmbeddings，用于生成文檔嵌入向量
from langchain.vectorstores import FAISS  # 導(dǎo)入FAISS，用于大數(shù)據(jù)集中高效的相似性搜索
from langchain.llms import OpenAI  # 導(dǎo)入OpenAI，用于訪問語言模型功能
from langchain import PromptTemplate  # 導(dǎo)入PromptTemplate，用于創(chuàng)建結(jié)構(gòu)化的語言模型提示
from langchain.chains import LLMChain  # 導(dǎo)入LLMChain，用于構(gòu)建使用語言模型的操作鏈
from dotenv import load_dotenv  # 導(dǎo)入load_dotenv，用于從.env文件加載環(huán)境變量

load_dotenv()  # 從.env文件加載環(huán)境變量

embedding = OpenAIEmbeddings()  # 初始化OpenAI嵌入向量的實(shí)例，用于生成文檔嵌入向量

# YouTube視頻的URL
video_url = "https://youtu.be/-Osca2Zax4Y?si=iy0iePxzUy_bUayO"

def create_vector_db_from_youtube_url(video_url: str) -> FAISS:
    # 加載YouTube視頻字幕
    loader = YoutubeLoader.from_youtube_url(video_url)
    transcript = loader.load()

    # 將字幕分割成較小的片段
    text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
    docs = text_splitter.split_documents(transcript)

    # 從文檔片段創(chuàng)建FAISS數(shù)據(jù)庫
    db = FAISS.from_documents(docs, embedding)
    return db

def get_response_from_query(db, query, k=4):
    # 對(duì)給定查詢執(zhí)行數(shù)據(jù)庫的相似性搜索
    docs = db.similarity_search(query, k=k)

    # 連接前幾個(gè)文檔的內(nèi)容
    docs_page_content = " ".join([d.page_content for d in docs])
    
    # 初始化一個(gè)OpenAI語言模型
    llm = OpenAI(model="text-davinci-003")

    # 定義語言模型的提示模板
    prompt = PromptTemplate(
        input_variables=["question", "docs"],
        template = """
        You are a helpful assistant that that can answer questions about youtube videos 
        based on the video's transcript.
        
        Answer the following question: {question}
        By searching the following video transcript: {docs}
        
        Only use the factual information from the transcript to answer the question.
        
        If you feel like you don't have enough information to answer the question, say "I don't know".
        
        Your answers should be verbose and detailed.
    """,
    )

    # 使用定義的提示創(chuàng)建一個(gè)語言模型鏈
    chain = LLMChain(llm=llm, prompt=prompt)

    # 使用查詢和連接的文檔運(yùn)行鏈
    response = chain.run(question=query, docs=docs_page_content)

    # 通過替換換行符來格式化響應(yīng)
    response = response.replace("\n", " ")
    return response, docs

# 示例用法：從YouTube視頻URL創(chuàng)建向量數(shù)據(jù)庫
# print(create_vector_db_from_youtube_url(video_url))

3.2 Streamlit 實(shí)現(xiàn)入?yún)⒁曨l地址和查詢內(nèi)容

main.py

import streamlit as st  # 導(dǎo)入Streamlit庫，用于創(chuàng)建Web應(yīng)用程序
import langchain_helper as lch  # 導(dǎo)入自定義模塊'langchain_helper'，用于處理langchain操作
import textwrap  # 導(dǎo)入textwrap模塊，用于格式化文本

st.title("YouTube Assistant")  # 設(shè)置Streamlit網(wǎng)頁應(yīng)用的標(biāo)題

# 使用Streamlit的側(cè)邊欄功能來創(chuàng)建輸入表單
with st.sidebar:
    # 在側(cè)邊欄中創(chuàng)建一個(gè)表單
    with st.form(key='my_form'):
        # 創(chuàng)建一個(gè)文本區(qū)域用于輸入YouTube視頻URL
        youtube_url = st.sidebar.text_area(
            label="What is the YouTube video URL?",
            max_chars=50
        )
        # 創(chuàng)建一個(gè)文本區(qū)域用于輸入關(guān)于YouTube視頻的查詢
        query = st.sidebar.text_area(
            label="Ask me about the video?",
            max_chars=50,
            key="query"
        )
        
        # 創(chuàng)建一個(gè)提交表單的按鈕
        submit_button = st.form_submit_button(label='Submit')

# 檢查是否同時(shí)提供了查詢和YouTube URL
if query and youtube_url:
    # 從YouTube視頻URL創(chuàng)建向量數(shù)據(jù)庫
    db = lch.create_vector_db_from_youtube_url(youtube_url)
    # 根據(jù)向量數(shù)據(jù)庫獲取查詢的響應(yīng)
    response, docs = lch.get_response_from_query(db, query)
    # 在應(yīng)用程序中顯示一個(gè)副標(biāo)題“回答：”
    st.subheader("Answer：")
    # 顯示響應(yīng)，格式化為每行85個(gè)字符
    st.text(textwrap.fill(response, width=85))

運(yùn)行

$ streamlit run main.py

You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.50.10:8501

  For better performance, install the Watchdog module:

What is the YouTube video URL?
https://youtu.be/-Osca2Zax4Y?si=iy0iePxzUy_bUayO

Ask me about the video?
What did they tal about Ransomware?

LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve,LLM-Large Language Models,langchain,chatgpt,LLM,faiss,prompt

參考文章來源地址http://www.zghlxwxcb.cn/news/detail-754237.html

https://github.com/zgpeace/pets-name-langchain/tree/feature/infoRetrievel
https://python.langchain.com/docs/integrations/document_loaders/youtube_transcript
https://youtu.be/lG7Uxts9SXs?si=H1CISGkoYiKRSF5V
https://engineering.fb.com/2017/03/29/data-infrastructure/faiss-a-library-for-efficient-similarity-search/
https://platform.openai.com/docs/models/gpt-4-and-gpt-4-turbo

到了這里，關(guān)于LangChain 4用向量數(shù)據(jù)庫Faiss存儲(chǔ)，讀取YouTube的視頻文本搜索Indexes for information retrieve的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

向量數(shù)據(jù)庫入坑：傳統(tǒng)文本檢索方式的降維打擊，使用 Faiss 實(shí)現(xiàn)向量語義檢索
在上一篇文章《聊聊來自元宇宙大廠 Meta 的相似度檢索技術(shù) Faiss》中，我們有聊到如何快速入門向量檢索技術(shù)，借助 Meta AI（Facebook Research）出品的 faiss 實(shí)現(xiàn)“最基礎(chǔ)的文本內(nèi)容相似度檢索工具”，初步接觸到了“語義檢索”這種對(duì)于傳統(tǒng)文本檢索方式具備“降維打擊”的新
2024年02月16日
瀏覽(96)
《向量數(shù)據(jù)庫指南》：向量數(shù)據(jù)庫Pinecone如何集成LangChain （一）
目錄 LangChain中的檢索增強(qiáng) 建立知識(shí)庫歡迎使用Pinecone和LangChain的集成指南。本文檔涵蓋了將高性能向量數(shù)據(jù)庫Pinecone與基于大型語言模型（LLMs）構(gòu)建應(yīng)用程序的框架LangChain集成的步驟。 ? Pinecone使開發(fā)人員能夠基于向量相似性搜索構(gòu)建可擴(kuò)展的實(shí)時(shí)推薦和搜索系統(tǒng)。另一方
2024年02月15日
瀏覽(18)
基于Langchain+向量數(shù)據(jù)庫+ChatGPT構(gòu)建企業(yè)級(jí)知識(shí)庫
▼最近直播超級(jí)多，預(yù)約保你有收獲近期直播：《基于 LLM 大模型的向量數(shù)據(jù)庫企業(yè)級(jí)應(yīng)用實(shí)踐》 ?1 — LangChain 是什么？眾所周知 OpenAI 的 API 無法聯(lián)網(wǎng)的，所以如果只使用自己的功能實(shí)現(xiàn)聯(lián)網(wǎng)搜索并給出回答、總結(jié) PDF 文檔、基于某個(gè) Youtube 視頻進(jìn)行問答等等的功能肯定
2024年02月06日
瀏覽(24)
使用Langchain+GPT+向量數(shù)據(jù)庫chromadb 來創(chuàng)建文檔對(duì)話機(jī)器人
使用Langchain+GPT+向量數(shù)據(jù)庫chromadb 來創(chuàng)建文檔對(duì)話機(jī)器人文件存放地址參考： https://python.langchain.com/docs/use_cases/chatbots https://python.langchain.com/docs/integrations/vectorstores/chroma https://blog.csdn.net/v_JULY_v/article/details/131552592?ops_request_misc=%257B%2522request%255Fid%2522%253A%252216945020581680022659096
2024年02月03日
瀏覽(94)
向量數(shù)據(jù)庫：使用Elasticsearch實(shí)現(xiàn)向量數(shù)據(jù)存儲(chǔ)與搜索
Here’s the table of contents: ??Elasticsearch在7.x的版本中支持向量檢索。在向量函數(shù)的計(jì)算過程中，會(huì)對(duì)所有匹配的文檔進(jìn)行線性掃描。因此，查詢預(yù)計(jì)時(shí)間會(huì)隨著匹配文檔的數(shù)量線性增長。出于這個(gè)原因，建議使用查詢參數(shù)來限制匹配文檔的數(shù)量（類似二次查找的邏輯，先使
2024年02月07日
瀏覽(98)
美國大模型風(fēng)向速報(bào)（一）為何重視提示工程？LangChain+向量數(shù)據(jù)庫+開源大模型真香...
多家，且獨(dú)家來自美國的信源同時(shí)向“親愛的數(shù)據(jù)”表示，提示工程（Prompt Engineering）在美國大模型領(lǐng)域備受重視。讀者都要聊，那就干活。（一）開源真香現(xiàn)階段，AI開源極客大展身手，新的軟件棧正在形成之中。開源很香，但是開源是零收費(fèi)，但不是零成本。甲方著
2024年02月12日
瀏覽(23)
（一）AI本地知識(shí)庫問答（可運(yùn)行）：LangChain+Chroma向量數(shù)據(jù)庫+OpenAi大模型
只需要看config目錄下的config.py，data目錄下的txt知識(shí)庫文件，db向量數(shù)據(jù)庫文件在持久化部署后會(huì)自動(dòng)生成，route下的app.py，scripts目錄下的Chroma向量庫持久化部署.py這幾個(gè)就可以，scripts目錄下的考勤問答.py和test目錄下都是單獨(dú)的自己測試的小代碼，可以不用關(guān)注因?yàn)檫\(yùn)行需要
2024年02月03日
瀏覽(29)
Elasticsearch：什么是向量和向量存儲(chǔ)數(shù)據(jù)庫，我們?yōu)槭裁搓P(guān)心？
Elasticsearch 從 7.3 版本開始支持向量搜索。從 8.0 開始支持帶有 HNSW 的 ANN 向量搜索。目前 Elasticsearch 已經(jīng)是全球下載量最多的向量數(shù)據(jù)庫。它允許使用密集向量和向量比較來搜索文檔。向量搜索在人工智能和機(jī)器學(xué)習(xí)領(lǐng)域有許多重要的應(yīng)用。有效存儲(chǔ)和檢索向量的數(shù)據(jù)庫對(duì)于
2024年02月08日
瀏覽(31)
ModaHub魔搭社區(qū)：AI原生云向量數(shù)據(jù)庫Zilliz Cloud與 LangChain 集成搭建智能文檔問答系統(tǒng)
目錄準(zhǔn)備工作主要參數(shù) 準(zhǔn)備數(shù)據(jù) 開始提問本文將演示如何使用 Zilliz Cloud 和 LangChain 搭建基于大語言模型（LLM）的問答系統(tǒng)。在本例中，我們將使用一個(gè) 1 CU 的 Cluster，還將使用 OpenAI 的 Embedding API 來獲取指定文本的向量表示?，F(xiàn)在就讓我們開始吧。運(yùn)行本頁中的腳本需要
2024年02月15日
瀏覽(27)
向量數(shù)據(jù)庫的崛起：如何改變數(shù)據(jù)存儲(chǔ)與機(jī)器學(xué)習(xí)的未來
??每周跟蹤AI熱點(diǎn)新聞動(dòng)向和震撼發(fā)展想要探索生成式人工智能的前沿進(jìn)展嗎？訂閱我們的簡報(bào)，深入解析最新的技術(shù)突破、實(shí)際應(yīng)用案例和未來的趨勢(shì)。與全球數(shù)同行一同，從行業(yè)內(nèi)部的深度分析和實(shí)用指南中受益。不要錯(cuò)過這個(gè)機(jī)會(huì)，成為AI領(lǐng)域的領(lǐng)跑者。點(diǎn)擊訂閱，與
2024年04月28日
瀏覽(21)