国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<strike id="ma91v"><noscript id="ma91v"></noscript></strike>

ChatGPT實(shí)戰(zhàn)-Embeddings打造定制化AI智能客服

2年前作者：itsc分類(lèi)：Toy博客閱讀(18)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了ChatGPT實(shí)戰(zhàn)-Embeddings打造定制化AI智能客服。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。

本文介紹Embeddings的基本概念，并使用最少但完整的代碼講解Embeddings是如何使用的，幫你打造專(zhuān)屬AI聊天機(jī)器人（智能客服），你可以拿到該代碼進(jìn)行修改以滿足實(shí)際需求。

ChatGPT的Embeddings解決了什么問(wèn)題？

如果直接問(wèn)ChatGPT：What is langchain? If you do not know please do not answer.，由于ChatGPT不知道2021年9月份之后的事情，而langchain比較新，是在那之后才有的，所以ChatGPT會(huì)回答不知道：

I’m sorry, but I don’t have any information on “l(fā)angchain.” It appears to be a term that is not widely recognized or used in general knowledge.

如果我們用上Embeddings，用上面的問(wèn)題提問(wèn)，它可以給出答案：

LangChain is a framework for developing applications powered by language models.

有了這個(gè)技術(shù)，我們就可以對(duì)自己的文檔進(jìn)行提問(wèn)，從而拓展ChatGPT的知識(shí)范圍，打造定制化的AI智能客服。例如在官網(wǎng)接入ChatGPT，根據(jù)網(wǎng)站的文檔讓他回答用戶的問(wèn)題。

Embeddings相關(guān)基本概念介紹

什么是Embeddings?

在跳進(jìn)代碼之前，先簡(jiǎn)要介紹一下什么是Embeddings。在介紹Embeddings之前我們需要先學(xué)習(xí)一下「向量」這個(gè)概念。

我們可以將一個(gè)事物從多個(gè)維度來(lái)描述，例如聲音可以從「時(shí)域」和「頻域」來(lái)描述（傅里葉變換可能很多人都聽(tīng)過(guò)），維度拆分的越多就越能描述一個(gè)事物，在向量空間上的接近往往意味著這兩個(gè)事物有更多的聯(lián)系，而向量空間又是比較好計(jì)算的，于是我們可以通過(guò)計(jì)算向量來(lái)判斷事物的相似程度。
ChatGPT實(shí)戰(zhàn)-Embeddings打造定制化AI智能客服,chatgpt,人工智能
在自然語(yǔ)言處理 (NLP) 的中，Embeddings是將單詞或句子轉(zhuǎn)換為數(shù)值向量的一種方法。這些向量捕獲單詞或句子的語(yǔ)義，使我們能夠?qū)λ鼈儓?zhí)行數(shù)學(xué)運(yùn)算。例如，我們可以計(jì)算兩個(gè)向量之間的余弦相似度來(lái)衡量它們?cè)谡Z(yǔ)義上的相似程度。
ChatGPT實(shí)戰(zhàn)-Embeddings打造定制化AI智能客服,chatgpt,人工智能

Embeddings使用流程講解

如何讓ChatGPT回答沒(méi)有訓(xùn)練過(guò)的內(nèi)容？流程如下，一圖勝千言。
ChatGPT實(shí)戰(zhàn)-Embeddings打造定制化AI智能客服,chatgpt,人工智能
分步解釋?zhuān)?/p>

首先是獲取本地?cái)?shù)據(jù)的embeddings結(jié)果，由于一次embeddings調(diào)用的token數(shù)量是有限制的，先將數(shù)據(jù)進(jìn)行分段然后以依次行調(diào)用獲得所有數(shù)據(jù)的embeddings結(jié)果。
然后我們開(kāi)始提問(wèn)，同樣的，將提問(wèn)的內(nèi)容也做一次embedding，得到一個(gè)結(jié)果。
再將提問(wèn)的intending結(jié)果和之前所有數(shù)據(jù)的embedded結(jié)果進(jìn)行距離的計(jì)算，這里的距離就是指向量之間的距離，然后我們獲取距離最近的幾段段數(shù)據(jù)來(lái)作為我們提問(wèn)的「上下文」（例如這里找到data2/data3是和問(wèn)題最相關(guān)的內(nèi)容）。
獲得上下文之后我們開(kāi)始構(gòu)造真正的問(wèn)題，問(wèn)題會(huì)將上下文也附屬在后面一并發(fā)送給chat gpt，這樣它就可以回答之前不知道的問(wèn)題了。

總結(jié)來(lái)說(shuō)：

之所以能夠讓ChatGPT回答他不知道的內(nèi)容，其實(shí)是因?yàn)槲覀儼严嚓P(guān)的上下文傳遞給了他，他從上下文中獲取的答案。如何確定要發(fā)送哪些上下文給他，就是通過(guò)計(jì)算向量距離得到的。

embedding實(shí)戰(zhàn)代碼（python）

讓我來(lái)看看實(shí)際的代碼。

前置條件

Python 3.6 或更高版本。
OpenAI API 密鑰，或者其他提供API服務(wù)的也可以。
安裝了以下 Python 軟件包： requests 、 beautifulsoup4 、 pandas 、 tiktoken 、 openai 、 numpy 。
私有文本數(shù)據(jù)集。在這個(gè)示例中，使用名為 langchainintro.txt 的文本文件，這里面是langchain官網(wǎng)的一些文檔說(shuō)明，文檔比較新所以ChatGPT肯定不知道，以此來(lái)測(cè)試效果。

代碼：

代碼來(lái)自于OpenAI官網(wǎng)，我做了一些改動(dòng)和精簡(jiǎn)。

import os
import numpy as np
import openai
import pandas as pd
import tiktoken
from ast import literal_eval
from openai.embeddings_utils import distances_from_embeddings
import traceback

tokenizer = tiktoken.get_encoding("cl100k_base")


def get_api_key():
    return os.getenv('OPENAI_API_KEY')


def set_openai_config():
    openai.api_key = get_api_key()
    openai.api_base = "https://openai.api2d.net/v1"


def remove_newlines(serie):
    serie = serie.str.replace('\n', ' ')
    serie = serie.str.replace('\\n', ' ')
    serie = serie.str.replace('  ', ' ')
    serie = serie.str.replace('  ', ' ')
    return serie


def load_text_files(file_name):
    with open(file_name, "r", encoding="UTF-8") as f:
        text = f.read()
    return text


def prepare_directory(dir_name="processed"):
    if not os.path.exists(dir_name):
        os.mkdir(dir_name)


def split_into_many(text, max_tokens):
    # Split the text into sentences
    sentences = text.split('. ')

    # Get the number of tokens for each sentence
    n_tokens = [len(tokenizer.encode(" " + sentence)) for sentence in sentences]

    chunks = []
    tokens_so_far = 0
    chunk = []

    # Loop through the sentences and tokens joined together in a tuple
    for sentence, token in zip(sentences, n_tokens):

        # If the number of tokens so far plus the number of tokens in the current sentence is greater
        # than the max number of tokens, then add the chunk to the list of chunks and reset
        # the chunk and tokens so far
        if tokens_so_far + token > max_tokens:
            chunks.append(". ".join(chunk) + ".")
            chunk = []
            tokens_so_far = 0

        # If the number of tokens in the current sentence is greater than the max number of
        # tokens, split the sentence into smaller parts and add them to the chunk
        while token > max_tokens:
            part = sentence[:max_tokens]
            chunk.append(part)
            sentence = sentence[max_tokens:]
            token = len(tokenizer.encode(" " + sentence))

        # Otherwise, add the sentence to the chunk and add the number of tokens to the total
        chunk.append(sentence)
        tokens_so_far += token + 1

    # Add the last chunk to the list of chunks
    if chunk:
        chunks.append(". ".join(chunk) + ".")

    return chunks


def shorten_texts(df, max_tokens):
    shortened = []

    # Loop through the dataframe
    for row in df.iterrows():
        # If the text is None, go to the next row
        if row[1]['text'] is None:
            continue

        # If the number of tokens is greater than the max number of tokens, split the text into chunks
        if row[1]['n_tokens'] > max_tokens:
            shortened += split_into_many(row[1]['text'], max_tokens)

        # Otherwise, add the text to the list of shortened texts
        else:
            shortened.append(row[1]['text'])

    df = pd.DataFrame(shortened, columns=['text'])
    df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))

    return df


def create_embeddings(df):
    df['embeddings'] = df.text.apply(
        lambda x: openai.Embedding.create(input=x, engine='text-embedding-ada-002')['data'][0]['embedding'])
    df.to_csv('processed/embeddings.csv')
    return df


def load_embeddings():
    df = pd.read_csv('processed/embeddings.csv', index_col=0)
    df['embeddings'] = df['embeddings'].apply(literal_eval).apply(np.array)
    return df


def create_context(
        question, df, max_len=1800, size="ada"
):
    """
    Create a context for a question by finding the most similar context from the dataframe
    """
    # print(f'start create_context')
    # Get the embeddings for the question
    q_embeddings = openai.Embedding.create(input=question, engine='text-embedding-ada-002')['data'][0]['embedding']
    # print(f'q_embeddings:{q_embeddings}')

    # Get the distances from the embeddings
    df['distances'] = distances_from_embeddings(q_embeddings, df['embeddings'].values, distance_metric='cosine')
    # print(f'df[distances]:{df["distances"]}')

    returns = []
    cur_len = 0

    # Sort by distance and add the text to the context until the context is too long
    for i, row in df.sort_values('distances', ascending=True).iterrows():
        # print(f'i:{i}, row:{row}')
        # Add the length of the text to the current length
        cur_len += row['n_tokens'] + 4

        # If the context is too long, break
        if cur_len > max_len:
            break

        # Else add it to the text that is being returned
        returns.append(row["text"])

    # Return the context
    return "\n\n###\n\n".join(returns)


def answer_question(
        df,
        model="text-davinci-003",
        question="Am I allowed to publish model outputs to Twitter, without a human review?",
        max_len=1800,
        size="ada",
        debug=False,
        max_tokens=150,
        stop_sequence=None
):
    """
    Answer a question based on the most similar context from the dataframe texts
    """
    context = create_context(
        question,
        df,
        max_len=max_len,
        size=size,
    )
    # If debug, print the raw model response
    if debug:
        print("Context:\n" + context)
        print("\n\n")

    prompt = f"Answer the question based on the context below, \n\nContext: {context}\n\n---\n\nQuestion: {question}\nAnswer:"
    messages = [
        {
            'role': 'user',
            'content': prompt
        }
    ]
    try:
        # Create a completions using the questin and context
        response = openai.ChatCompletion.create(
            messages=messages,
            temperature=0,
            max_tokens=max_tokens,
            stop=stop_sequence,
            model=model,
        )
        return response["choices"][0]["message"]["content"]
    except Exception as e:
        # print stack
        traceback.print_exc()
        print(e)
        return ""


def main():
    # 設(shè)置API key
    set_openai_config()

    # 載入本地?cái)?shù)據(jù)
    texts = []
    text = load_text_files("langchainintro.txt")
    texts.append(('langchainintro', text))
    prepare_directory("processed")

    # 創(chuàng)建一個(gè)dataframe，包含fname和text兩列
    df = pd.DataFrame(texts, columns=['fname', 'text'])
    df['text'] = df.fname + ". " + remove_newlines(df.text)
    df.to_csv('processed/scraped.csv')

    # 計(jì)算token數(shù)量
    df.columns = ['title', 'text']
    df['n_tokens'] = df.text.apply(lambda x: len(tokenizer.encode(x)))
    # print(f'{df}')
    df = shorten_texts(df, 500)

    # 如果processed/embeddings.csv已經(jīng)存在，直接load，不存在則create
    if os.path.exists('processed/embeddings.csv'):
        df = load_embeddings()
    else:
        df = create_embeddings(df)

    print(f"What is langchain? If you do not know please do not answer.")
    ans = answer_question(df, model='gpt-3.5-turbo', question="What is langchain? If you do not know please do not answer.", debug=False)
    print(f'ans:{ans}')


if __name__ == '__main__':
    main()

代碼流程與時(shí)序圖的流程基本一致，注意api_key需要放入環(huán)境變量，也可以自己改動(dòng)。

如果直接問(wèn)ChatGPT：What is langchain? If you do not know please do not answer.，ChatGPT會(huì)回答不知道：

I’m sorry, but I don’t have any information on “l(fā)angchain.” It appears to be a term that is not widely recognized or used in general knowledge.

運(yùn)行上面的代碼，它可以給出答案：

LangChain is a framework for developing applications powered by language models.

可以看到它使用了我們提供的文檔來(lái)回答。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-742970.html

拓展

注意token消耗，如果你的本地?cái)?shù)據(jù)非常多，embedding階段將會(huì)消耗非常多的token，請(qǐng)注意使用。
embedding階段仍然會(huì)將本地?cái)?shù)據(jù)傳給ChatGPT，如果你有隱私需求，需要注意。
一般生產(chǎn)環(huán)境會(huì)將向量結(jié)果存入「向量數(shù)據(jù)庫(kù)」而不是本地文件，此處為了演示直接使用的文本文件存放。

到了這里，關(guān)于ChatGPT實(shí)戰(zhàn)-Embeddings打造定制化AI智能客服的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來(lái)自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

【ChatGPT】ChatGPT 在智能客服產(chǎn)品如何落地？
目錄簡(jiǎn)介智能客服產(chǎn)品的典型特征人力密集：數(shù)據(jù)密集：流程可定義：
2024年02月07日
瀏覽(19)
人工智能交互革命：探索ChatGPT的無(wú)限可能第4章 ChatGPT-智能客服
智能客服是一種利用人工智能技術(shù)，為客戶提供在線服務(wù)和支持的解決方案。它能夠通過(guò)自然語(yǔ)言處理、機(jī)器學(xué)習(xí)等技術(shù)，識(shí)別和理解客戶的問(wèn)題，并提供針對(duì)性的解決方案。智能客服可以通過(guò)多種渠道提供服務(wù)，包括網(wǎng)站、社交媒體、短信、電話等。智能客服的發(fā)展可以追
2023年04月25日
瀏覽(22)
得ChatGPT者，得智能客服天下？
?數(shù)據(jù)智能產(chǎn)業(yè)創(chuàng)新服務(wù)媒體 ——聚焦數(shù)智?· 改變商業(yè) 在現(xiàn)代社會(huì)，高效、專(zhuān)業(yè)的客服服務(wù)已成為企業(yè)、組織機(jī)構(gòu)競(jìng)爭(zhēng)力的關(guān)鍵要素。智能客服系統(tǒng)應(yīng)運(yùn)而生，智能客服系統(tǒng)對(duì)客服的賦能作用和價(jià)值主要表現(xiàn)在提高效率、降低成本、優(yōu)化用戶體驗(yàn)、深度挖掘用戶需求、數(shù)據(jù)
2024年02月04日
瀏覽(21)
ChatGPT在智能客服產(chǎn)品落地探討
AI語(yǔ)言模型中的ChatGPT近期在互聯(lián)網(wǎng)平臺(tái)上引起了廣泛的討論。那么，如果想將這個(gè)大型語(yǔ)言模型應(yīng)用在智能客服產(chǎn)品中，或者將其在ToB SaaS應(yīng)用軟件領(lǐng)域落地，應(yīng)該采用哪種構(gòu)建策略？現(xiàn)在ChatGPT這個(gè)大型語(yǔ)言模型已經(jīng)在各種平臺(tái)獲得了廣泛的關(guān)注。那么，如果在ToB SaaS應(yīng)用軟
2024年02月08日
瀏覽(13)
前端react如何引入ChatUI實(shí)現(xiàn)類(lèi)似chatgpt智能客服
可以看官網(wǎng)：ChatUI 第一步： \\\"@chatui/core\\\": \\\"^2.4.2\\\", 第二步：可以參考這幾種方法：前端react如何引入chatgpt實(shí)現(xiàn)智能客服_react chatgpt-CSDN博客 React AntDesign 聊天機(jī)器人阿里ChatUI使用-CSDN博客封裝一個(gè)絲滑的聊天框組件_react.js_jacoby_fire-華為云開(kāi)發(fā)者聯(lián)盟搭建一個(gè)AI對(duì)話機(jī)器人——
2024年04月26日
瀏覽(23)
容聯(lián)七陌：ChatGPT大模型能力為智能客服帶來(lái)新方向
科技云報(bào)道原創(chuàng)。近幾個(gè)月來(lái)，大眾對(duì)ChatGPT預(yù)期的持續(xù)走高，也影響到了智能客服領(lǐng)域公司的命運(yùn)。一方面，ChatGPT的出現(xiàn)為智能客服場(chǎng)景帶來(lái)了更加“智能”的可能性；但另一方面，有人認(rèn)為ChatGPT完全可以替代現(xiàn)有的智能客服產(chǎn)品，畢竟智能客服“聽(tīng)不懂人話”也該被整
2024年02月03日
瀏覽(18)
【ChatGPT】從零開(kāi)始構(gòu)建基于ChatGPT的嵌入式(Embedding) 本地（Local）智能客服問(wèn)答機(jī)器人模型
? 目錄方案流程 1. Embeddings 介紹術(shù)語(yǔ)：微調(diào) vs 嵌入
2024年02月10日
瀏覽(35)
飛書(shū)ChatGPT機(jī)器人 – 打造智能問(wèn)答助手
在飛書(shū)中創(chuàng)建chatGPT機(jī)器人并且對(duì)話，在下面操作步驟中，使用到了Git克隆項(xiàng)目，需提前安裝好Git，克隆的項(xiàng)目是Go語(yǔ)言項(xiàng)目，所以需提前安裝Go語(yǔ)言環(huán)境。 Git Go1.20 首次注冊(cè)飛書(shū),我們可以創(chuàng)建個(gè)人賬號(hào) 進(jìn)入后我們創(chuàng)建一個(gè)飛書(shū) 企業(yè)自建項(xiàng)目然后設(shè)置機(jī)器人名稱(chēng)和描述,下面
2024年02月16日
瀏覽(21)
基于ChatGPT打造一個(gè)智能數(shù)據(jù)分析系統(tǒng)
最近最火的AI話題無(wú)疑就是ChatGPT了，讓大家看到了通用智能領(lǐng)域的巨大進(jìn)步，ChatGPT已經(jīng)能足夠好的回答人們提出的各種問(wèn)題，因此我也在想能否利用ChatGPT來(lái)理解用戶對(duì)于數(shù)據(jù)分析方面的提問(wèn)，把這些提問(wèn)轉(zhuǎn)化為相應(yīng)的數(shù)據(jù)分析任務(wù)，再把結(jié)果返回給用戶。例如我們有一個(gè)數(shù)
2024年02月10日
瀏覽(20)
ChatGPT新突破：打造自己的智能機(jī)器人控制系統(tǒng)
?? 作者簡(jiǎn)介：大家好，我是Zeeland，全棧領(lǐng)域優(yōu)質(zhì)創(chuàng)作者。 ?? CSDN主頁(yè)：Zeeland?? ?? 我的博客：Zeeland ?? Github主頁(yè): Undertone0809 (Zeeland) (github.com) ?? 支持我：點(diǎn)贊??+收藏??+留言?? ?? 系列專(zhuān)欄：Python系列專(zhuān)欄 ?? ??介紹：The mixture of software dev+Iot+ml+anything?? 【promptu
2024年02月08日
瀏覽(29)

<dfn id="pj6rq"><button id="pj6rq"></button></dfn>