国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<rt id="s2exh"></rt>

<style id="s2exh"></style>

NLP之LLMs：《Zeno Chatbot Report》的翻譯與解讀—CMU副教授詳測(cè)七款個(gè)類ChatGPT大模型(GPT-2、LLaMa、Alpaca、Vicuna、MPT-Chat、Coher

2年前作者：一個(gè)處女座的程序猿分類：Toy博客閱讀(19)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了NLP之LLMs：《Zeno Chatbot Report》的翻譯與解讀—CMU副教授詳測(cè)七款個(gè)類ChatGPT大模型(GPT-2、LLaMa、Alpaca、Vicuna、MPT-Chat、Coher。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。

NLP之LLMs：《Zeno Chatbot Report》的翻譯與解讀—CMU副教授詳測(cè)七款個(gè)類ChatGPT大模型(GPT-2、LLaMa、Alpaca、Vicuna、MPT-Chat、Cohere Command和ChatGPT)

目錄

《Zeno Chatbot Report》的翻譯與解讀—CMU副教授詳細(xì)測(cè)評(píng)七款個(gè)類ChatGPT大模型

Overview概覽

Setup設(shè)置

Model Settings模型設(shè)置

Evaluation Metrics評(píng)估指標(biāo)

Further Analysis進(jìn)一步分析

Results結(jié)果

How well do models perform overall?模型整體表現(xiàn)如何？

Accuracy by Gold-standard Response Length根據(jù)標(biāo)準(zhǔn)人類回復(fù)長(zhǎng)度的準(zhǔn)確性

How important is the context window?上下文窗口有多重要？

How important is the prompt?提示的重要性有多大？

Discovered Errors (and possible mitigations)發(fā)現(xiàn)的錯(cuò)誤（及可能的緩解措施）

Hallucinations錯(cuò)覺(jué)

Failure to Probe無(wú)法探詢

Repeated Content重復(fù)內(nèi)容

Correct正確的回答

Final Words最后

《Zeno Chatbot Report》的翻譯與解讀—CMU副教授詳細(xì)測(cè)評(píng)七款個(gè)類ChatGPT大模型

作者	Alex Cabrera和 Graham Neubig，CMU副教授
時(shí)間	2023 年 5 月 18 日
地址	zeno-build/tasks/chatbot/report at main · zeno-ml/zeno-build · GitHub

Overview概覽

Large language models (LLMs) are taking the world by storm, and one big application for them is chat, with applications in question answering, customer service, and many others. However, chatbots are notoriously hard to evaluate, and there still isn’t a clear sense about which of the recent models are best to use in what situations.

大型語(yǔ)言模型（LLMs）正風(fēng)靡全球，其中一個(gè)主要應(yīng)用是聊天，包括問(wèn)答、客戶服務(wù)等多個(gè)領(lǐng)域。然而，聊天機(jī)器人的評(píng)估一直以來(lái)都很困難，目前對(duì)于最近的模型在不同情境下的最佳選擇還沒(méi)有清晰的認(rèn)識(shí)。

In this report, we demonstrate some first results on evaluating and comparing recent chatbots, with the goal of making it easier for people to understand the current lay-of-the-land with respect to all of the open-source and API-based models coming out recently. In particular, we create a new open-source toolkit for evaluating LLMs, Zeno Build. This combines (1) a unified interface to use open-source LLMs through Hugging Face or online APIs,

(2) an online interface for browsing and analyzing results using Zeno, and

(3) state-of-the-art evaluation metrics for text using Critique.

在這份報(bào)告中，我們展示了對(duì)最近聊天機(jī)器人的評(píng)估和比較的初步結(jié)果，旨在幫助人們更容易地了解最近發(fā)布的開(kāi)源和API模型的現(xiàn)狀。具體而言，我們創(chuàng)建了一個(gè)新的開(kāi)源工具包用于評(píng)估LLMs，名為Zeno Build。它結(jié)合了以下三個(gè)方面：

（1）通過(guò)Hugging Face或在線API使用開(kāi)源LLMs的統(tǒng)一接口，

（2）使用Zeno進(jìn)行瀏覽和分析結(jié)果的在線界面，

（3）使用Critique進(jìn)行文本的最先進(jìn)評(píng)估指標(biāo)。

Browse the results here

Highlights:

We evaluated 7 language models: GPT-2, LLaMa, Alpaca, Vicuna, MPT-Chat, Cohere Command, and ChatGPT (gpt-3.5-turbo)
The models were evaluated on their ability to create human-like responses on a customer service dataset
ChatGPT came out on top, but the open-source chat model Vicuna was also very competitive
We find that it is important to use a chat-tuned model with a long context window
Prompt engineering particularly improves performance for turns early in the conversation, but less so in later turns where more context is available
Even for a strong model like ChatGPT, it is easy to find obvious issues in hallucinations, failure to probe for more information, and repeated content

Read on for more detail, try out Zeno Build if you want to play around yourself, and we very much welcome additional contributions! To get in touch, open an issue on the issues page, jump in the Zeno discord, or get in contact via email.

在這里瀏覽結(jié)果

亮點(diǎn)：

我們?cè)u(píng)估了7個(gè)語(yǔ)言模型：GPT-2、LLaMa、Alpaca、Vicuna、MPT-Chat、Cohere Command和ChatGPT（gpt-3.5-turbo）
這些模型在客戶服務(wù)數(shù)據(jù)集上評(píng)估了它們生成類似人類回復(fù)的能力
ChatGPT表現(xiàn)最佳，但開(kāi)源聊天模型Vicuna也非常有競(jìng)爭(zhēng)力
我們發(fā)現(xiàn)使用一個(gè)經(jīng)過(guò)聊天調(diào)優(yōu)的模型和長(zhǎng)上下文窗口非常重要
提示工程特別提高了對(duì)話早期回合的性能，但在后續(xù)回合中，因?yàn)橛懈嗟纳舷挛目捎?，效果稍遜
即使對(duì)于像ChatGPT這樣強(qiáng)大的模型，我們?nèi)匀缓苋菀装l(fā)現(xiàn)明顯的問(wèn)題，如產(chǎn)生虛假信息、未能探索更多信息以及重復(fù)內(nèi)容

如果你想了解更多細(xì)節(jié)，請(qǐng)繼續(xù)閱讀，如果你想自己嘗試，請(qǐng)使用Zeno Build，我們非常歡迎額外的貢獻(xiàn)！如果你想聯(lián)系我們，請(qǐng)?jiān)趩?wèn)題頁(yè)面提出問(wèn)題、加入Zeno的Discord群，或通過(guò)電子郵件聯(lián)系我們。

Setup設(shè)置

Model Settings模型設(shè)置

GPT-2、LLaMa、Alpaca、Vicuna、MPT-Chat、Cohere Command、ChatGPT

We use the DSTC11 customer service dataset, which includes agent-customer customer service interactions. We test 7 models:

GPT-2: A classic language model from 2019. We added this as a baseline to see how much the recent progress in language modeling has made a difference in building better chat models.
LLaMa: A language model originally trained by Meta AI that uses a straight-up language modeling objective. We use the 7B model for this and all following open-source models.
Alpaca: A model based on LLaMa that additionally uses instruction tuning.
Vicuna: A model based on LLaMa that is further explicitly tuned for chatbot-based applications.
MPT-Chat: A model trained from scratch in a way similar to Vicuna, which has a more commercially permissive license.
Cohere Command: An API-based model by Cohere that is tuned for following commands.
ChatGPT (gpt-3.5-turbo): The standard-bearer of API-based chat models by OpenAI.

For all models by default we use a temperature of 0.3, context window of 4 previous chat turns, and a standard prompt saying “You are a chatbot tasked with making small-talk with people.” (with other ablations below).

我們使用了DSTC11客戶服務(wù)數(shù)據(jù)集，其中包括代理商和客戶之間的客戶服務(wù)互動(dòng)。我們測(cè)試了7個(gè)模型：

GPT-2：2019年的經(jīng)典語(yǔ)言模型。我們將其作為基準(zhǔn)模型，以了解最近在語(yǔ)言建模方面取得的進(jìn)展在構(gòu)建更好的聊天模型方面有多大影響。
LLaMa：Meta AI最初訓(xùn)練的語(yǔ)言模型，使用純粹的語(yǔ)言建模目標(biāo)。我們?cè)谶@個(gè)模型和后續(xù)的開(kāi)源模型中使用了7B模型。
Alpaca：基于LLaMa的模型，此外還使用了指令調(diào)優(yōu)。
Vicuna：基于LLaMa的模型，進(jìn)一步明確針對(duì)聊天機(jī)器人應(yīng)用進(jìn)行了調(diào)優(yōu)。
MPT-Chat：以類似Vicuna的方式從頭開(kāi)始訓(xùn)練的模型，具有更商業(yè)友好的許可證。
Cohere Command：Cohere提供的基于API的模型，專門用于遵循指令。
ChatGPT（gpt-3.5-turbo）：由OpenAI提供的API聊天模型的旗艦。

對(duì)于所有模型，默認(rèn)情況下，我們使用溫度值為0.3，上下文窗口為4個(gè)先前的對(duì)話輪次，以及標(biāo)準(zhǔn)提示：“你是一個(gè)與人進(jìn)行閑聊的聊天機(jī)器人?！保ㄏ旅孢€有其他消融設(shè)置）。

Evaluation Metrics評(píng)估指標(biāo)

We evaluated the models based on how similar their outputs are to human customer service responses. This was done using metrics provided by the Critique toolkit:

chrf: Measures the overlap of character strings
BERTScore: Measures overlap of embeddings between the two utterances
UniEval Coherence: Predicts how coherent the outputs are with the previous chat turn

We also measured length ratio, which simply measures the length of the output divided by the length of the gold-standard human response, indicating how verbose the chatbot is.

我們根據(jù)模型輸出與人類客戶服務(wù)回復(fù)的相似程度進(jìn)行評(píng)估。我們使用Critique工具包提供的指標(biāo)進(jìn)行評(píng)估：

chrf：衡量字符串之間的重疊程度
BERTScore：衡量兩個(gè)話語(yǔ)之間嵌入的重疊程度
UniEval一致性：預(yù)測(cè)輸出與先前的對(duì)話輪次的連貫性

我們還測(cè)量了長(zhǎng)度比率，簡(jiǎn)單地將輸出的長(zhǎng)度除以標(biāo)準(zhǔn)的人類回復(fù)長(zhǎng)度，以表示聊天機(jī)器人的冗長(zhǎng)程度。

Further Analysis進(jìn)一步分析

To dig deeper into the results, we used the Zeno analysis interface, specifically using its report generator to subdivide the examples based on the position in the conversation (start, early, middle, and late) and the length of the gold-standard human response (short, medium, and long), and its exploration interface to look through examples with bad automatic scores, and to better understand where each of the models is failing.

We also did ablation studies on the Vicuna model, trying different context windows and prompts in the analysis.

為了深入研究結(jié)果，我們使用Zeno分析界面，具體使用其報(bào)告生成器根據(jù)對(duì)話位置（開(kāi)始、早期、中間和后期）和標(biāo)準(zhǔn)的人類回復(fù)長(zhǎng)度（短、中等和長(zhǎng)）對(duì)示例進(jìn)行細(xì)分，使用其探索界面查看自動(dòng)評(píng)分較低的示例，并更好地了解每個(gè)模型的失敗之處。

我們還對(duì)Vicuna模型進(jìn)行了消融研究，嘗試了不同的上下文窗口和提示方式。

Results結(jié)果

How well do models perform overall?模型整體表現(xiàn)如何？

According to all of these metrics, gpt-3.5-turbo was the clear winner. Vicuna was the open-source Winner. GPT-2 and LLaMa were not very good, demonstrating the importance of training directly on chat.

These rankings also approximately match those of the lmsys chat arena, which uses human A/B testing to compare models, but Zeno Build’s results were obtained without any human ratings.

根據(jù)所有這些指標(biāo)，gpt-3.5-turbo是明顯的優(yōu)勝者。Vicuna是開(kāi)源模型中的優(yōu)勝者。GPT-2和LLaMa的表現(xiàn)不太好，這說(shuō)明直接在聊天上進(jìn)行訓(xùn)練的重要性。

這些排名與lmsys chat arena的排名大致相符，lmsys chat arena使用人類A/B測(cè)試來(lái)比較模型，但Zeno Build的結(jié)果是在沒(méi)有任何人類評(píng)級(jí)的情況下獲得的。

With regards to verbosity, gpt3.5-turbo is far more verbose than the others, and it seems that models tuned for chat tend to be verbose in general.

至于冗長(zhǎng)程度，gpt3.5-turbo比其他模型更冗長(zhǎng)，而且似乎針對(duì)聊天進(jìn)行調(diào)優(yōu)的模型總體上更冗長(zhǎng)。

lmsys chat arena的排名：https://chat.lmsys.org/

使用 Elo 評(píng)級(jí)系統(tǒng)來(lái)計(jì)算模型的相對(duì)性能

Accuracy by Gold-standard Response Length根據(jù)標(biāo)準(zhǔn)人類回復(fù)長(zhǎng)度的準(zhǔn)確性

Next, we used the Zeno report UI to dig deeper. First, we measure accuracy separately by short (≤35 characters), medium (36-70 characters), and long (≥71 characters) human responses.

gpt-3.5-turbo and Vicuna maintain accuracy even on longer chat turns while others drop off.

接下來(lái)，我們使用Zeno報(bào)告界面進(jìn)行更深入的分析。首先，我們分別衡量短（≤35個(gè)字符）、中等（36-70個(gè)字符）和長(zhǎng)（≥71個(gè)字符）人類回復(fù)的準(zhǔn)確性。

gpt-3.5-turbo和Vicuna在更長(zhǎng)的對(duì)話中仍然保持準(zhǔn)確性，而其他模型則下降。

How important is the context window?上下文窗口有多重要？

We experimented using Vicuna with context windows ranging from 1-4 previous utterances. As we increase the context window, the performance goes up, indicating that larger context windows are important.	我們使用Vicuna嘗試了1-4個(gè)先前話語(yǔ)的上下文窗口。隨著上下文窗口的增加，性能提高，表明較大的上下文窗口很重要。
Longer context is particularly important in the middle and later parts of the conversation, where responses are less templated and more dependent on what was said previously.	在對(duì)話的中間和后期，更長(zhǎng)的上下文尤其重要，因?yàn)榛貜?fù)不那么模板化，更依賴于先前的對(duì)話內(nèi)容。
More context is particularly important when trying to generate outputs where the gold standard is shorter (possibly because there is more ambiguity).	當(dāng)嘗試生成金標(biāo)準(zhǔn)較短的輸出時(shí)，更多的上下文尤為重要（可能是因?yàn)榇嬖诟嗟钠缌x）。

How important is the prompt?提示的重要性有多大？

We tried 5 different prompts - 4 generic ones and one specifically tailored to the task of customer service chat in the insurance domain:

Standard: “You are a chatbot tasked with making small-talk with people.”
Friendly: “You are a kind and friendly chatbot tasked with making small-talk with people in a way that makes them feel pleasant.”
Polite: “You are an exceedingly polite chatbot that speaks very formally and tries to not make any missteps in your responses.”
Cynical: “You are a cynical chatbot that has a very dark view of the world and in general likes to point out any possible problems.”
Insurance: “You are an agent at the Rivertown Insurance helpdesk that mainly helps with resolving insurance claims.”

我們嘗試了5個(gè)不同的提示方式：4個(gè)通用提示和一個(gè)針對(duì)保險(xiǎn)領(lǐng)域客戶服務(wù)聊天任務(wù)的特定提示：

標(biāo)準(zhǔn)提示：“你是一個(gè)與人進(jìn)行閑聊的聊天機(jī)器人?！?/li>
友好提示：“你是一個(gè)友善而友好的聊天機(jī)器人，旨在以讓人感到愉快的方式與人進(jìn)行閑聊?！?/li>
禮貌提示：“你是一個(gè)非常有禮貌的聊天機(jī)器人，講話非常正式，盡量不出差錯(cuò)。”
憤世嫉俗提示：“你是一個(gè)憤世嫉俗的聊天機(jī)器人，對(duì)世界持有非常消極的看法，通常喜歡指出任何可能存在的問(wèn)題。”
保險(xiǎn)提示：“你是Rivertown Insurance幫助臺(tái)的一名代理人，主要幫助解決保險(xiǎn)索賠問(wèn)題?！?/li>

Overall, the prompt didn’t make a very large measurable difference, but the “cynical” chatbot was a little bit worse, and the tailored “insurance” chatbot was a little bit better overall.

總體而言，提示對(duì)結(jié)果影響不大，但“憤世嫉俗”聊天機(jī)器人稍差一些，而專門定制的“保險(xiǎn)”聊天機(jī)器人整體上稍好一些。

The differences were especially stark on the first turn of the conversation, indicating that the prompt is most important when there is little other context to work with.

在對(duì)話的第一個(gè)輪次上，差異尤為明顯，這表明提示在很少的上下文情況下最重要。

Discovered Errors (and possible mitigations)發(fā)現(xiàn)的錯(cuò)誤（及可能的緩解措施）

Finally, we used Zeno’s exploration UI to try to find possible errors by gpt-3.5-turbo, the worst performing model. Specifically, we looked at all examples that had low chrf (<0.1) and looked through them manually to find trends.

最后，我們使用Zeno的探索界面來(lái)嘗試發(fā)現(xiàn)gpt-3.5-turbo這個(gè)表現(xiàn)最差的模型可能存在的錯(cuò)誤。具體而言，我們查看了所有chrf得分低（<0.1）的示例，并通過(guò)手動(dòng)查看這些示例來(lái)找出其中的趨勢(shì)。

Hallucinations錯(cuò)覺(jué)

Sometimes the model generates factually incorrect statements, particularly based on providing false customer information or information about the company policies. This would need to be solved by adding more information about the customer into the prompt, or looking up company policies and referring to them when answering specific questions.

有時(shí)模型會(huì)生成事實(shí)上不正確的陳述，特別是基于提供虛假的客戶信息或公司政策信息。這可能需要通過(guò)在提示中添加更多關(guān)于客戶的信息，或在回答特定問(wèn)題時(shí)查找公司政策并參考它們來(lái)解決。

Failure to Probe無(wú)法探詢

Sometimes the model fails to probe for more information when it’s actually necessary, such as continuing listening for a number when the number is not yet complete. This could possibly be mitigated by modifying the prompt to remind the model of the required shape for certain pieces of information (e.g. a phone number must be 10 digits).

有時(shí)模型在實(shí)際需要時(shí)未能繼續(xù)探詢更多信息，比如在號(hào)碼尚未完整輸入時(shí)仍然繼續(xù)監(jiān)聽(tīng)號(hào)碼。這可能可以通過(guò)修改提示來(lái)提醒模型對(duì)某些信息的要求形式（例如，電話號(hào)碼必須是10位數(shù)字）來(lái)緩解。

Repeated Content重復(fù)內(nèi)容

Sometimes the same content is repeated multiple times, such as the bot saying “thank you” twice here.

有時(shí)相同的內(nèi)容會(huì)重復(fù)多次，比如這里機(jī)器人說(shuō)了兩次“謝謝”。

Correct正確的回答

Sometimes the response is reasonable, but just different than the human response.

有時(shí)回答是合理的，只是與人類回答不同。

Final Words最后

We hope this report was helpful! If you want to try other models, other dataset, other prompts, or other hyperparameter settings, jump over to the chatbot example on the zeno-build repository to try it out. We’ll be happy to discuss more and answer any questions via email, discord, or Github issues.

希望這份報(bào)告對(duì)您有所幫助！如果您想嘗試其他模型、其他數(shù)據(jù)集、其他提示或其他超參數(shù)設(shè)置，請(qǐng)轉(zhuǎn)到zeno-build存儲(chǔ)庫(kù)中的聊天機(jī)器人示例來(lái)嘗試。我們很樂(lè)意通過(guò)電子郵件、Discord或GitHub問(wèn)題討論更多并回答任何問(wèn)題。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-485236.html

到了這里，關(guān)于NLP之LLMs：《Zeno Chatbot Report》的翻譯與解讀—CMU副教授詳測(cè)七款個(gè)類ChatGPT大模型(GPT-2、LLaMa、Alpaca、Vicuna、MPT-Chat、Coher的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來(lái)自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

NLP / LLMs中的Temperature 是什么?
ChatGPT, GPT-3, GPT-3.5, GPT-4, LLaMA, Bard等大型語(yǔ)言模型的一個(gè)重要的超參數(shù) 大型語(yǔ)言模型能夠根據(jù)給定的上下文或提示生成新文本，由于神經(jīng)網(wǎng)絡(luò)等深度學(xué)習(xí)技術(shù)的進(jìn)步，這些模型越來(lái)越受歡迎?？捎糜诳刂粕烧Z(yǔ)言模型行為的關(guān)鍵參數(shù)之一是Temperature 參數(shù)。在本文中，我們將討
2023年04月16日
瀏覽(14)
NLP | 基于LLMs的文本分類任務(wù)
比賽鏈接：訊飛開(kāi)放平臺(tái) 來(lái)源：DataWhale?AI夏令營(yíng)3（NLP） ? ①Roberta在預(yù)訓(xùn)練的階段中沒(méi)有對(duì)下一句話進(jìn)行預(yù)測(cè)（ NSP ） ②采用了動(dòng)態(tài)掩碼 ③使用字符級(jí) 和詞級(jí)別表征的混合文本編碼。論文：https://arxiv.org/pdf/1907.11692.pdf ? DataWhale Topline的改進(jìn)： ??特征1：平均池化Mean
2024年02月11日
瀏覽(20)
NLP和LLMs: 理解它們之間的區(qū)別
NLP（自然語(yǔ)言處理）和LLMs（大型語(yǔ)言模型）都與處理自然語(yǔ)言相關(guān)，但它們的重點(diǎn)和范圍略有不同。自然語(yǔ)言處理（NLP）：定義：自然語(yǔ)言處理（NLP）是人工智能領(lǐng)域的一個(gè)子領(lǐng)域，專注于研究和開(kāi)發(fā)使計(jì)算機(jī)能夠理解、處理、生成自然語(yǔ)言文本的技術(shù)和方法。目標(biāo) ：
2024年04月17日
瀏覽(27)
最近火出圈的GPT-4 技術(shù)Report出來(lái)了，快進(jìn)來(lái)看看逐文對(duì)照翻譯！
近期OpenAI發(fā)布的GPT-4的效果好得讓人驚艷！碾壓了之前火到出圈的ChatGPT，通過(guò)同步發(fā)布的GPT-4 Technical Report一同看看到底發(fā)生了什么！ No.0 摘要 We report the development of GPT-4, a large-scale, multimodal model which can accept image and text inputs and produce text outputs. While less capable than humans in many
2024年02月14日
瀏覽(24)
NLP&ChatGPT&LLMs技術(shù)、源碼、案例實(shí)戰(zhàn)210課
NLPChatGPTLLMs技術(shù)、源碼、案例實(shí)戰(zhàn)210課超過(guò)12.5萬(wàn)行NLP/ChatGPT/LLMs代碼的AI課程講師介紹現(xiàn)任職于硅谷一家對(duì)話機(jī)器人CTO，專精于Conversational AI 在美國(guó)曾先后工作于硅谷最頂級(jí)的機(jī)器學(xué)習(xí)和人工智能實(shí)驗(yàn)室 CTO、杰出AI工程師、首席機(jī)器學(xué)習(xí)工程師美國(guó)一家Talents Sourcing公司的F
2024年02月07日
瀏覽(21)
翻譯: LLMs關(guān)于人工智能的擔(dān)憂 Concerns about AI
在短時(shí)間內(nèi)，獲取生成人工智能的能力已經(jīng)在全球范圍內(nèi)傳播，使許多人能夠生成高質(zhì)量的文章、圖片和音頻。隨著這些驚人的能力的出現(xiàn)，也帶來(lái)了許多關(guān)于人工智能的擔(dān)憂。我認(rèn)為即使在生成人工智能興起之前，我們就已經(jīng)生活在許多焦慮之中。對(duì)環(huán)境的擔(dān)憂，對(duì)權(quán)威的
2024年02月03日
瀏覽(27)
GPT-4原論文詳細(xì)解讀（GPT-4 Technical Report）
返回論文和資料目錄相比之前的GPT-3.5等大型語(yǔ)言模型（這里可以看我的InstructGPT解讀，也方便理解本文內(nèi)容），GPT-4最大的不同在于變成了多模態(tài)，即輸出不變的情況下，輸入可以為圖片或文本。其展現(xiàn)了優(yōu)于ChatGPT模型并且非常強(qiáng)大的性能。讀者可在OpenAI官網(wǎng)體驗(yàn)體驗(yàn)，不過(guò)
2023年04月22日
瀏覽(16)
LLMs NLP模型評(píng)估Model evaluation ROUGE and BLEU SCORE
在整個(gè)課程中，你看到過(guò)類似模型在這個(gè)任務(wù)上表現(xiàn)良好，或者這個(gè)微調(diào)模型在性能上相對(duì)于基礎(chǔ)模型有顯著提升等陳述。這些陳述是什么意思？如何形式化你的微調(diào)模型在你起初的預(yù)訓(xùn)練模型上的性能改進(jìn)？讓我們探討一些由大型語(yǔ)言模型開(kāi)發(fā)者使用的指標(biāo)，你可以用這些
2024年02月10日
瀏覽(13)
NLP——Translation 機(jī)器翻譯
基于統(tǒng)計(jì)的機(jī)器翻譯任務(wù)通常通過(guò)翻譯模型（Translation Model）和語(yǔ)言模型（Language Model）的結(jié)合來(lái)學(xué)習(xí)更強(qiáng)大的翻譯模型。這種結(jié)合被稱為統(tǒng)計(jì)機(jī)器翻譯（SMT）。翻譯模型（Translation Model）：翻譯模型主要關(guān)注如何將源語(yǔ)言句子翻譯成目標(biāo)語(yǔ)言句子。它使用雙語(yǔ)語(yǔ)料庫(kù)進(jìn)行訓(xùn)練
2024年02月09日
瀏覽(20)
幾個(gè)nlp的小任務(wù)（機(jī)器翻譯）
2024年02月10日
瀏覽(23)