国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

2023年的深度學習入門指南(6) - 在你的電腦上運行大模型

2年前作者：Jtag特工分類：Toy博客閱讀(23)違法舉報

這篇具有很好參考價值的文章主要介紹了2023年的深度學習入門指南(6) - 在你的電腦上運行大模型。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

2023年的深度學習入門指南(6) - 在你的電腦上運行大模型

上一篇我們介紹了大模型的基礎，自注意力機制以及其實現(xiàn)Transformer模塊。因為Transformer被PyTorch和TensorFlow等框架所支持，所以我們只要能夠配置好框架的GPU或者其他加速硬件的支持，就可以運行起來了。

而想運行大模型，恐怕就沒有這么容易了，很有可能你需要一臺Linux電腦。因為目前流行的AI軟件一般都依賴大量的開源工具，尤其是要進行優(yōu)化的情況下，很可能需要從源碼進行編譯。一旦涉及到開源軟件和編譯這些事情，在Windows上的難度就變成hard模式了。

大部分開發(fā)者自身都是在開源系統(tǒng)上做開發(fā)的，Windows的適配關(guān)注得較少，甚至完全不關(guān)心。雖然從Cygwin, MinGW, CMake到WSL，各方都為Windows上支持大量Linux開源庫進行了不少努力，但是就像在Linux上沒有Windows那么多游戲一樣，這是生態(tài)的問題。

我們先選取幾個Windows的兼容性稍好的項目，讓用Windows的同學們也可以體驗本機的大模型。

Nomic AI gpt4all (基于LLaMA)

2022年末chatgpt橫空出世之后，Meta公司認為openai背離了open的宗旨，于是半開放了他們的大模型LLaMA。半開放的原因是，網(wǎng)絡的權(quán)重文件需要跟Meta公司申請。
LLaMA主要是針對英語材料進行訓練，也引用了部分使用拉丁字母和西里爾字母的語言。它的分詞器可以支持漢語和日語，但是并沒有使用漢語和日語的材料。

因為不并對所有人開放，我們講解LLaMA是沒有意義的。但是我們可以嘗試一些基于LLaMA的項目，比如Nomic AI的gpt4all。

gpt4all的貼心之處是針對Windows, M1 Mac和Intel Mac三種平臺都進行了適配，當然默認肯定是支持Linux的。而且，推理使用CPU就可以。

下面我們就將其運行起來吧。

首先下載gpt4all的代碼：

git clone https://github.com/nomic-ai/gpt4all

第二步，下載量化之后的網(wǎng)絡權(quán)重值文件：https://the-eye.eu/public/AI/models/nomic-ai/gpt4all/gpt4all-lora-quantized.bin

第三步，將下載的gpt4all-lora-quantized.bin放在gpt4all的chat目錄下

第四步，運行g(shù)pt4all-lora-quantized可執(zhí)行文件。以Windows平臺為例，就是運行g(shù)pt4all-lora-quantized-win64.exe。可以在powershell中執(zhí)行，也可以直接點擊。

運行后，當加載完模型之后，我們就可以跟gpt4all對話了：

2023年的深度學習入門指南(6) - 在你的電腦上運行大模型,深度學習,人工智能

下面是gpt4all生成的C++快速排序的代碼：

Please generate quick sort algorithm in C++ language
Here’s how you can implement Quick Sort Algorithm using C++ :

// Implementation of QuickSort() function for array A[] with size N and pivot value P[N/2]
void QS(int *A, int n) { // Function prototype declaration
    if (n <=1 ) return;   /* If there is only one element in the input Array */
        // QuickSort() function for array A[] with size N and pivot value P[N/2]
        else{                      //else statement to check condition of quick sort algorithm.
            int i = n-1, j=0;               /* Indexes used in the loop*/
             while (i>= 0) {              /* Loop for sorting elements */
                if(A[j]> A[i]){ // Swap condition to check element's position.
                    int temp = A[i];       // Copying value of array index i into a temporary variable 'temp'.
                    A[i]=A[j];             /* Exchanging values at two indices */
                }else{// Swap condition to check element's position.
                    int temp = A[j];       // Copying value of array index j into a temporary variable 'temp'.
                    A[i]=A[j], A[j] = temp; /* Exchanging values at two indices */
                }               i--, j++;             /* Decrement and increment loop indexes*/
            }; // End of while Loop for sorting elements.
        }//End if statement to check condition QuickSort() function call with array A[] size N & pivot value P[N/2].

恭喜你！你終于成功在自己的電腦上跑起了一個跟chatgpt一樣了不起的預訓練大模型。

這個模型雖然能力比chatgpt差了不少，但是它的模型來自LLaMA，訓練數(shù)據(jù)部分取自chatgpt的對話記錄，采用了低秩適應(LoRA: Low-Rank Adaptation)和量化的技術(shù)來縮減模型大小。每一項技術(shù)都是目前的前沿技術(shù)。

清華團隊的單卡大模型：ChatGLM-6B

清華團隊的ChatGLM-6B也是一個目標為運行在單GPU卡上的大模型，其底座為1300萬參數(shù)的ChatGLM。

下圖是2022年10月斯坦福大學對當時主要大模型的評測結(jié)果：
2023年的深度學習入門指南(6) - 在你的電腦上運行大模型,深度學習,人工智能

從上圖數(shù)據(jù)來看，雖然ChatGLM是落后于InstuctGPT的chatgpt，但是從跟其它模型的比較來看實力還不錯。

我們就來試試基于ChatGLM的壓縮后的效果。

我們先下載代碼：

git clone https://github.com/THUDM/ChatGLM-6B

然后安裝其依賴的庫：

pip install -r requirements.txt

上面的gpt4all的Python編程接口不支持Windows，而ChatGLM-6B基于Hugging Face Transformer庫開發(fā)，支持Windows下的Python編程，我們寫個代碼來調(diào)用它吧：

from transformers import AutoTokenizer, AutoModel

tokenizer = AutoTokenizer.from_pretrained("THUDM/chatglm-6b", trust_remote_code=True)
model = AutoModel.from_pretrained("THUDM/chatglm-6b-int4-qe", trust_remote_code=True).half().cuda()
model = model.eval()
response, history = model.chat(tokenizer, "用C++實現(xiàn)快速排序", history=[])
print(response)

輸出的結(jié)果如下：

2023年的深度學習入門指南(6) - 在你的電腦上運行大模型,深度學習,人工智能

#include <iostream>
using namespace std;

void swap(int& a, int& b) {
    int temp = a;
    a = b;
    b = temp;
}

int partition(int arr[], int low, int high) {
    int pivot = arr[high];
    int i = (low - 1);

    for (int j = low; j < high - 1; j++) {
        if (arr[j] < pivot) {
            i++;
            swap(arr[i], arr[j]);
        }
    }
    swap(arr[i+1], arr[high]);
    return i+1;
}

void quicksort(int arr[], int low, int high) {
    if (low < high) {
        int pi = partition(arr, low, high);
        quicksort(arr, low, pi - 1);
        quicksort(arr, pi + 1, high);
    }
}

int main() {
    int arr[] = {5, 2, 9, 1, 6, 3, 8};
    int n = sizeof(arr) / sizeof(arr[0]);

    quicksort(arr, 0, n-1);

    cout << arr[0] << endl;
    return 0;
}

是不是效果還可以？有點chatgpt的意思了吧？

如果你的PyTorch或者Tensorflow的GPU支持裝好了的話，這個推理就是用GPU來完成的。我選用了最省顯存的4位量化，如果你的顯卡更好，可以選擇壓縮比更低一些的模型。

這里面我們可以引出Transformer時代的門戶，hugging face。我們在上面代碼中所使用的from的 transformers庫，就是hugging face出品的。

from transformers import AutoTokenizer, AutoModel

2023年的深度學習入門指南(6) - 在你的電腦上運行大模型,深度學習,人工智能

從上圖我們可以看到，Hugging face基本上就是各種Transformer模型的集散地。使用Hugging face的接口，就可以使用基本上所有的開源的大模型。

大模型是如何煉成的

雖然網(wǎng)絡權(quán)值需要申請，但是Meta的LLaMA大模型的模型代碼是開源的。我們來看看LLaMA的Transformer跟我們上一節(jié)構(gòu)造的標準的Transformer有什么區(qū)別：

class Transformer(nn.Module):
    def __init__(self, params: ModelArgs):
        super().__init__()
        self.params = params
        self.vocab_size = params.vocab_size
        self.n_layers = params.n_layers

        self.tok_embeddings = ParallelEmbedding(
            params.vocab_size, params.dim, init_method=lambda x: x
        )

        self.layers = torch.nn.ModuleList()
        for layer_id in range(params.n_layers):
            self.layers.append(TransformerBlock(layer_id, params))

        self.norm = RMSNorm(params.dim, eps=params.norm_eps)
        self.output = ColumnParallelLinear(
            params.dim, params.vocab_size, bias=False, init_method=lambda x: x
        )

        self.freqs_cis = precompute_freqs_cis(
            self.params.dim // self.params.n_heads, self.params.max_seq_len * 2
        )

我們看到，為了加強并發(fā)訓練，Meta的全連接網(wǎng)絡用的是它們自己的ColumnParallelLinear。它們的詞嵌入層也是自己做的并發(fā)版。

根據(jù)層次數(shù)，它也是堆了若干層的TransformerBlock。

我們再來看這個Block:

class TransformerBlock(nn.Module):
    def __init__(self, layer_id: int, args: ModelArgs):
        super().__init__()
        self.n_heads = args.n_heads
        self.dim = args.dim
        self.head_dim = args.dim // args.n_heads
        self.attention = Attention(args)
        self.feed_forward = FeedForward(
            dim=args.dim, hidden_dim=4 * args.dim, multiple_of=args.multiple_of
        )
        self.layer_id = layer_id
        self.attention_norm = RMSNorm(args.dim, eps=args.norm_eps)
        self.ffn_norm = RMSNorm(args.dim, eps=args.norm_eps)

    def forward(self, x: torch.Tensor, start_pos: int, freqs_cis: torch.Tensor, mask: Optional[torch.Tensor]):
        h = x + self.attention.forward(self.attention_norm(x), start_pos, freqs_cis, mask)
        out = h + self.feed_forward.forward(self.ffn_norm(h))
        return out

我們發(fā)現(xiàn)，它沒有使用標準的多頭注意力，而是自己實現(xiàn)了一個注意力類。

class Attention(nn.Module):
    def __init__(self, args: ModelArgs):
        super().__init__()

        self.n_local_heads = args.n_heads // fs_init.get_model_parallel_world_size()
        self.head_dim = args.dim // args.n_heads

        self.wq = ColumnParallelLinear(
            args.dim,
            args.n_heads * self.head_dim,
            bias=False,
            gather_output=False,
            init_method=lambda x: x,
        )
        self.wk = ColumnParallelLinear(
            args.dim,
            args.n_heads * self.head_dim,
            bias=False,
            gather_output=False,
            init_method=lambda x: x,
        )
        self.wv = ColumnParallelLinear(
            args.dim,
            args.n_heads * self.head_dim,
            bias=False,
            gather_output=False,
            init_method=lambda x: x,
        )
        self.wo = RowParallelLinear(
            args.n_heads * self.head_dim,
            args.dim,
            bias=False,
            input_is_parallel=True,
            init_method=lambda x: x,
        )

        self.cache_k = torch.zeros(
            (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
        ).cuda()
        self.cache_v = torch.zeros(
            (args.max_batch_size, args.max_seq_len, self.n_local_heads, self.head_dim)
        ).cuda()

鬧了半天就是支持了并發(fā)和加了cache的多頭注意力，K,V,Q穿了個馬甲，本質(zhì)上還是多頭自注意力。

其它有趣的工程

LM Flow

LM Flow也是最近很火的項目，它是香港科技大學在LLaMA的基礎上搞的全流程開源的，可以在單3090 GPU上進行訓練的工程。

其地址在：https://github.com/OptimalScale/LMFlow

LMFlow目前的獨特價值在于，它提供的流程比較完整。

比如，在目前的開源項目中，LMFlow是少有的提供了Instruction Tuning的工程。

我們來看個Instruction Tuning的例子：

{"id": 0, "instruction": "The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words.", "input": "If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.", "infer30b_before_item": " Output: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n---\nInput: Input: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n Output: Output: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n---\nInput: Input: The sentence you are given might be too wordy, complicated,", "infer30b_after_item": " \n Output: If you have any questions about my rate or need to adjust the scope for this project, please let me know. \n\n", "infer13b_before_item": " The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n", "infer13b_after_item": " \n Output: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know. \n\n", "infer7b_before_item": " The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\nInput: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\nOutput: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by keeping it concise. Whenever possible, break complex sentences into multiple sentences and eliminate unnecessary words. If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know.\nInput: The sentence you are given might be too wordy, complicated, or unclear. Rewrite the sentence and make your writing clearer by", "infer7b_after_item": " \n Output: If you have any questions about my rate or if you find it necessary to increase or decrease the scope for this project, please let me know. \n\n"}

這讓我們見識到了，原來糾錯就是這樣搞的。這是LLaMA中所缺少的。

HuggingGPT

最近浙大和微軟的團隊又推出了充分利用Hugging Face的門戶中樞地位的Jarvis工程。

2023年的深度學習入門指南(6) - 在你的電腦上運行大模型,深度學習,人工智能

很不幸的是，上面的兩個工程，加上前面工程的高級應用，很難在Windows上面完成。我們后面將統(tǒng)一介紹這些需要在Linux環(huán)境下的實驗。文章來源地址http://www.zghlxwxcb.cn/news/detail-601879.html

小結(jié)

通過對大模型進行剪枝、降秩、量化等手段，我們是可以在資源受限的電腦上運行推理的。當然，性能是有所損失的。我們可以根據(jù)業(yè)務場景去平衡，如果能用prompt engineer解決最好
HuggingFace是預訓練大模型的編程接口和模型集散地
大模型的基本原理仍然是我們上節(jié)學習的自注意力模型

到了這里，關(guān)于2023年的深度學習入門指南(6) - 在你的電腦上運行大模型的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權(quán)，不承擔相關(guān)法律責任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務器費用

2023年的深度學習入門指南(2) - 給openai API寫前端
上一篇我們說了，目前的大規(guī)模預訓練模型技術(shù)還避免不了回答問題時出現(xiàn)低級錯誤。但是其實，人類犯的邏輯錯誤也是層出不窮。比如，有人就認為要想學好chatgpt，就要先學好Python。其隱含的推理過程可能是這樣的： TensorFlow需要使用Python PyTorch需要使用Python Scikit-Learn需
2023年04月08日
瀏覽(27)
2023年的深度學習入門指南(9) - SIMD和通用GPU編程
深度學習從一開始就跟GPU有不解之緣，因為算力是深度學習不可或缺的一部分。時至今日，雖然多任務編程早已經(jīng)深入人心，但是很多同學還沒有接觸過CPU上的SIMD指令，更不用說GPGPU的編程。這一篇我們先給SIMD和GPU編程掃個盲，讓大家以后用到的時候有個感性認識。從多線
2024年02月02日
瀏覽(27)
2023年的深度學習入門指南(26) - 在自己電腦上運行通義千問7b模型
通過量化，通義千問4位量化的模型大小為5.86G，可以在3060等小于16G的家用GPU上也可以運行起來。通義千問7b提供了4位量化好的Qwen/Qwen-7B-Chat-Int4模型，我們直接調(diào)用就好。首先安裝依賴包：如果你是Linux環(huán)境的話，可以安裝下Flash-Attention來加速： Windows下暫時還用不了，這個
2024年02月10日
瀏覽(24)
測牛學堂：2023軟件測試入門學習指南（測試方法之邊界值法）
邊界值分析法邊界值：輸入數(shù)據(jù)是一個有序的集合或者范圍的時候，處于集合范圍的邊界上的值。邊界值的幾個常用的概念：上點：邊界上的點。比如條件是（1，9）那么上點就是2和9 離點：開區(qū)間的離點，就是反方向去取。(1,9) 的離點，就是2和8 內(nèi)點：范圍內(nèi)除了上點和
2023年04月25日
瀏覽(21)
手把手帶你入門深度學習（一）：保姆級Anaconda和PyTorch環(huán)境配置指南
B站：馬上就更！??！_bilibili CSDN：手把手帶你入門深度學習（一）：保姆級Anaconda和PyTorch環(huán)境配置指南_百年后封筆-CSDN博客 Github：封筆公眾號：百年后封筆你好，我是封筆！如今深度學習技術(shù)的不斷演進，我們的生活發(fā)生著翻天覆地的變化，無論是計算機視覺、自然語言處
2024年02月08日
瀏覽(96)
kotlin入門教程指南（2023最新）
Kotlin 是一個基于 JVM 的新的編程語言，目前在國外非?；馃?，并且在一步步走向國內(nèi)市場 Kotlin有以下好處：強大的IDE。而且是JetBrains第一方支持，不是3年更新一次的第三方插件；庫多生態(tài)強。Kotlin的設計者非常重視和Java的互操作，所以Kotlin號稱可以無縫銜接所有Java庫。
2024年02月14日
瀏覽(37)
立體匹配入門指南（8）：視差圖、深度圖、點云
本篇是比較簡單的基礎概念，剛?cè)腴T的朋友可能是需要的。視差圖三維點云首先，我們要介紹下這三個概念。視差（disparity）視差 d d d 等于同名點對在左視圖的列坐標減去在右視圖上的列坐標，是像素單位 d = x l ? x r d=x_l-x_r d = x l ? ? x r ? 立體視覺里，視差概念在極
2023年04月08日
瀏覽(61)
【機器學習學習】第一天：入門指南
引言當今社會，機器學習技術(shù)已經(jīng)被廣泛應用于許多領(lǐng)域，如自然語言處理、圖像處理和金融分析等。然而，機器學習這一領(lǐng)域需要掌握大量的數(shù)學知識和編程技能，因此對于初學者來說，可能會感到非常困難。本文將為初學者提供一份機器學習入門指南，幫助他們了解機器
2024年02月02日
瀏覽(26)
一位計科學長寫給 2023 級計算機類和人工智能專業(yè)的同學們的程序設計入門指南
本指南內(nèi)容較多，但你們?nèi)裟苣托淖x完，你們將收獲很多…… 歡迎訪問作者的主頁：Xi Xu’s Home Page 什么是程序設計和程序設計語言？程序設計 1 （programming），或稱編程，是給程序解決出特定問題的過程，軟件開發(fā)過程中的重要步驟。程序設計方法往往以某種程序設計語言
2024年02月16日
瀏覽(25)
前端學習路線指南：從入門到精通【①】
作為一個前端開發(fā)者，學習前端技術(shù)是必不可少的。然而，由于前端領(lǐng)域的廣闊和不斷演進的技術(shù)棧，對于初學者來說可能會感到困惑。本篇文章將為你提供一個清晰的前端學習路線，幫助你系統(tǒng)地掌握前端開發(fā)技能，并成為一名優(yōu)秀的前端工程師。 HTML和CSS基礎在開始前端
2024年02月08日
瀏覽(28)

<optgroup id="ddm7n"></optgroup>

<acronym id="ddm7n"></acronym>