国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<noscript id="qvje2"><cite id="qvje2"></cite></noscript>

<td id="qvje2"></td>

<label id="qvje2"><noframes id="qvje2">

Llama模型結(jié)構(gòu)解析（源碼閱讀）

2年前作者：lokvke分類：Toy博客閱讀(16)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了Llama模型結(jié)構(gòu)解析（源碼閱讀）。希望對大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

參考資料：
https://zhuanlan.zhihu.com/p/636784644
https://spaces.ac.cn/archives/8265 ——《Transformer升級之路：2、博采眾長的旋轉(zhuǎn)式位置編碼》

前言：本次閱讀代碼位置，在transformers庫底下的modeling_llama.py，具體位置在：transformers/models/llama/modeling_llama.py，如下圖所示： Llama模型結(jié)構(gòu)解析（源碼閱讀）,NLP,llama,大語言模型,源碼閱讀,llama模型結(jié)構(gòu),nlp

1. LlamaModel整體結(jié)構(gòu)流程圖

Llama模型結(jié)構(gòu)解析（源碼閱讀）,NLP,llama,大語言模型,源碼閱讀,llama模型結(jié)構(gòu),nlp

2. LlamaRMSNorm

代碼如下

class LlamaRMSNorm(nn.Module):
    def __init__(self, hidden_size, eps=1e-6):
        """
        LlamaRMSNorm is equivalent to T5LayerNorm
        """
        super().__init__()
        self.weight = nn.Parameter(torch.ones(hidden_size))
        self.variance_epsilon = eps

    def forward(self, hidden_states):
        input_dtype = hidden_states.dtype
        variance = hidden_states.to(torch.float32).pow(2).mean(-1, keepdim=True)
        hidden_states = hidden_states * torch.rsqrt(variance + self.variance_epsilon)

        return (self.weight * hidden_states).to(input_dtype)

RMSNorm的公式如下所示：
$\frac{x_i}{\sqrt{\frac{1}{n}\sum\limits_{i=1}^{n}{x_i}^2 + eps}} * weight_i$
- 其中，公式與代碼的對應(yīng)關(guān)系如下：

3. LlamaMLP

代碼如下：

class LlamaMLP(nn.Module):
    def __init__(
        self,
        hidden_size: int,
        intermediate_size: int,
        hidden_act: str,
    ):
        super().__init__()
        self.gate_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
        self.down_proj = nn.Linear(intermediate_size, hidden_size, bias=False)
        self.up_proj = nn.Linear(hidden_size, intermediate_size, bias=False)
        self.act_fn = ACT2FN[hidden_act]

    def forward(self, x):
        return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))

流程圖：
其中輸入為x，輸出為y
代碼中intermediate_size一般比hidden_size大，我們通過在jupyter notebook中打印Llama-13B的模型，可以看到如下所示：
總結(jié)：MLP模塊就是幾個(gè)nn.Linear的組合

4. LlamaRotaryEmbedding

代碼如下


class LlamaRotaryEmbedding(torch.nn.Module):
    def __init__(self, dim, max_position_embeddings=2048, base=10000, device=None):
        super().__init__()
        inv_freq = 1.0 / (base ** (torch.arange(0, dim, 2).float().to(device) / dim))
        self.register_buffer("inv_freq", inv_freq)

        # Build here to make `torch.jit.trace` work.
        self.max_seq_len_cached = max_position_embeddings
        t = torch.arange(self.max_seq_len_cached, device=self.inv_freq.device, dtype=self.inv_freq.dtype)
        freqs = torch.einsum("i,j->ij", t, self.inv_freq)
        # Different from paper, but it uses a different permutation in order to obtain the same calculation
        emb = torch.cat((freqs, freqs), dim=-1)
        self.register_buffer("cos_cached", emb.cos()[None, None, :, :], persistent=False)
        self.register_buffer("sin_cached", emb.sin()[None, None, :, :], persistent=False)

    def forward(self, x, seq_len=None):
        # x: [bs, num_attention_heads, seq_len, head_size]
        # This `if` block is unlikely to be run after we build sin/cos in `__init__`. Keep the logic here just in case.
        if seq_len > self.max_seq_len_cached:
            self.max_seq_len_cached = seq_len
            t = torch.arange(self.max_seq_len_cached, device=x.device, dtype=self.inv_freq.dtype)
            freqs = torch.einsum("i,j->ij", t, self.inv_freq)
            # Different from paper, but it uses a different permutation in order to obtain the same calculation
            emb = torch.cat((freqs, freqs), dim=-1).to(x.device)
            self.register_buffer("cos_cached", emb.cos()[None, None, :, :], persistent=False)
            self.register_buffer("sin_cached", emb.sin()[None, None, :, :], persistent=False)
        return (
            self.cos_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
            self.sin_cached[:, :, :seq_len, ...].to(dtype=x.dtype),
        )

具體的使用，還調(diào)用了另外兩個(gè)函數(shù)，如下所示：

def rotate_half(x):
    """Rotates half the hidden dims of the input."""
    x1 = x[..., : x.shape[-1] // 2]
    x2 = x[..., x.shape[-1] // 2 :]
    return torch.cat((-x2, x1), dim=-1)


def apply_rotary_pos_emb(q, k, cos, sin, position_ids):
    # The first two dimensions of cos and sin are always 1, so we can `squeeze` them.
    cos = cos.squeeze(1).squeeze(0)  # [seq_len, dim]
    sin = sin.squeeze(1).squeeze(0)  # [seq_len, dim]
    cos = cos[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
    sin = sin[position_ids].unsqueeze(1)  # [bs, 1, seq_len, dim]
    q_embed = (q * cos) + (rotate_half(q) * sin)
    k_embed = (k * cos) + (rotate_half(k) * sin)
    return q_embed, k_embed

注意這里的實(shí)現(xiàn)跟原始推導(dǎo)有點(diǎn)區(qū)別，這里實(shí)現(xiàn)的方式如下圖所示：
原始推導(dǎo)如下圖所示：

具體可以查看作者的博客：??戳我??
總結(jié)：RoPE就是在attention計(jì)算時(shí)，K跟Q做內(nèi)積之前，先給各自注入位置信息。

結(jié)束。文章來源地址http://www.zghlxwxcb.cn/news/detail-685218.html

到了這里，關(guān)于Llama模型結(jié)構(gòu)解析（源碼閱讀）的文章就介紹完了。如果您還想了解更多內(nèi)容，請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

[NLP]LLM---FineTune自己的Llama2模型
Let’s talk a bit about the parameters we can tune here. First, we want to load a? llama-2-7b-hf ?model and train it on the? mlabonne/guanaco-llama2-1k ?(1,000 samples), which will produce our fine-tuned model? llama-2-7b-miniguanaco . If you’re interested in how this dataset was created, you can check?this notebook. Feel free to change it: there ar
2024年02月09日
瀏覽(21)
Facebook Meta 以其最先進(jìn)的基礎(chǔ)語言模型 LLaMA 升溫 AI 競賽(含項(xiàng)目源碼)
Meta AI 已經(jīng)進(jìn)入了由大型語言模型 (LLM) 主導(dǎo)的 AI 競賽，例如 OpenAI 的 ChatGPT、微軟的 GPT-powered Bing 和谷歌的 Bard。Meta 首席執(zhí)行官馬克·扎克伯格 (Mark Zuckerberg) 在Facebook帖子中發(fā)布了這一消息：“今天我們發(fā)布了一種名為 LLaMA 的新型最先進(jìn)的 AI 大型語言模型，旨在幫助研究人員
2024年02月15日
瀏覽(29)
[NLP]使用Alpaca-Lora基于llama模型進(jìn)行微調(diào)教程
Stanford Alpaca 是在 LLaMA 整個(gè)模型上微調(diào)，即對預(yù)訓(xùn)練模型中的所有參數(shù)都進(jìn)行微調(diào)（full fine-tuning）。但該方法對于硬件成本要求仍然偏高且訓(xùn)練低效。 [NLP]理解大型語言模型高效微調(diào)(PEFT) 因此， Alpaca-Lora 則是利用 Lora 技術(shù)，在凍結(jié)原模型 LLaMA 參數(shù)的情況下，通過往模型中加
2024年02月15日
瀏覽(19)
NLP（六十四）使用FastChat計(jì)算LLaMA-2模型的token長度
LLaMA-2模型部署 ??在文章NLP（五十九）使用FastChat部署百川大模型中，筆者介紹了 FastChat 框架，以及如何使用 FastChat 來部署百川模型。 ??本文將會部署LLaMA-2 70B模型，使得其兼容OpenAI的調(diào)用風(fēng)格。部署的 Dockerfile 文件如下： Docker-compose.yml 文件如下：部署成功后，會占用
2024年02月12日
瀏覽(22)
LLaMA模型論文《LLaMA: Open and Efficient Foundation Language Models》閱讀筆記
LLaMA是meta在2023年2月開源的大模型，在這之后，很多開源模型都是基于LLaMA的，比如斯坦福大學(xué)的羊駝模型。 LLaMA的重點(diǎn)是比通常情況下使用更多的語料，來訓(xùn)練一系列可在各種推理預(yù)算下實(shí)現(xiàn)可能的最佳性能的語言模型。摘要翻譯：我們在此介紹LLaMA，這是一個(gè)參數(shù)范圍從
2024年02月15日
瀏覽(32)
[NLP] 使用Llama.cpp和LangChain在CPU上使用大模型-RAG
下面是構(gòu)建這個(gè)應(yīng)用程序時(shí)將使用的軟件工具: 1.Llama-cpp-python ?下載llama-cpp, llama-cpp-python [NLP] Llama2模型運(yùn)行在Mac機(jī)器-CSDN博客 2、LangChain LangChain是一個(gè)提供了一組廣泛的集成和數(shù)據(jù)連接器，允許我們鏈接和編排不同的模塊?？梢猿Ｒ娏奶鞕C(jī)器人、數(shù)據(jù)分析和文檔問答等應(yīng)用。
2024年02月04日
瀏覽(21)
羊駝2:開放的基礎(chǔ)和微調(diào)聊天模型--Llama 2論文閱讀
論文地址：https://arxiv.org/pdf/2307.09288.pdfd 代碼地址：GitHub - facebookresearch/llama-recipes: Examples and recipes for Llama 2 model 問答這篇文檔中使用了3.3M GPU小時(shí)的計(jì)算，使用的硬件類型是A100-80GB，可以擴(kuò)展到2000個(gè)GPU，但這些計(jì)算的功耗估計(jì)并不包括互連或非GPU服務(wù)器功耗，也不包括數(shù)據(jù)
2024年01月16日
瀏覽(18)
LLaMA-Adapter源碼解析
2024年02月06日
瀏覽(21)
LLaMA中ROPE位置編碼實(shí)現(xiàn)源碼解析
1、Attention中q，經(jīng)下式，生成新的q。m為句長length，d為embedding_dim/head θ i = 1 1000 0 2 i d theta_i=frac{1}{10000^frac{2i}n5n3t3z} θ i ? = 1000 0 d 2 i ? 1 ? 2、LLaMA中RoPE源碼 3、表示看1中的公式表示，q0和q1相互作用，得到新的q0和q1，也即 q 0 n e w = q 0 ? c o s ( m θ 0 ) ? q 1 ? s i n ( m θ 0 )
2024年02月11日
瀏覽(21)
NLP-分詞器：SentencePiece【參考Chinese-LLaMA-Alpaca在通用中文語料上訓(xùn)練的20K中文詞表并與原版LLaMA模型的32K詞表進(jìn)行合并的代碼】
隨著ChatGPT迅速出圈，最近幾個(gè)月開源的大模型也是遍地開花。目前，開源的大語言模型主要有三大類：ChatGLM衍生的大模型（wenda、ChatSQL等）、LLaMA衍生的大模型（Alpaca、Vicuna、BELLE、Phoenix、Chimera等）、Bloom衍生的大模型（Bloomz、BELLE、Phoenix等）。其中，ChatGLM-6B主要以中英雙
2024年02月11日
瀏覽(23)

<mark id="prjpa"><xmp id="prjpa">

<ins id="prjpa"><th id="prjpa"><noscript id="prjpa"></noscript></th></ins>

<dl id="prjpa"><small id="prjpa"><span id="prjpa"></span></small></dl>

<rt id="prjpa"><small id="prjpa"><sup id="prjpa"></sup></small></rt>