??????Hugging Face 實(shí)戰(zhàn)系列 總目錄
有任何問題歡迎在下面留言
本篇文章的代碼運(yùn)行界面均在notebook中進(jìn)行
本篇文章配套的代碼資源已經(jīng)上傳
下篇內(nèi)容:
Hugging Face實(shí)戰(zhàn)-系列教程4:padding與attention_mask文章來源地址http://www.zghlxwxcb.cn/news/detail-702790.html
?輸出我們需要幾個(gè)輸出呢?比如說這個(gè)cls分類,我們做一個(gè)10分類,可以嗎?對每一個(gè)詞做10分類可以嗎?預(yù)測下一個(gè)詞是什么可以嗎?是不是也可以!
在我們的NLP任務(wù)中,相比圖像任務(wù)有分類有回歸,NLP有回歸這一說嗎?我們要做的所有任務(wù)都是分類,就是把分類做到哪兒而已,不管做什么都是分類。
比如我們剛剛導(dǎo)入的兩個(gè)英語句子,是對序列做情感分析,就是一個(gè)二分類,用序列做分類,你想導(dǎo)什么輸出頭,你就導(dǎo)入什么東西就可以了,簡不簡單?好簡單是不是,上代碼:
from transformers import AutoModelForSequenceClassification
checkpoint = "distilbert-base-uncased-finetuned-sst-2-english"
model = AutoModelForSequenceClassification.from_pretrained(checkpoint)
outputs = model(**inputs)
print(outputs.logits.shape)
導(dǎo)入一個(gè)序列分類的包,還是選擇checkpoint這個(gè)名字,選擇分詞器,導(dǎo)入模型,將模型打印一下:
DistilBertForSequenceClassification(
(distilbert): DistilBertModel(
(embeddings): Embeddings(
(word_embeddings): Embedding(30522, 768, padding_idx=0)
(position_embeddings): Embedding(512, 768)
(LayerNorm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(dropout): Dropout(p=0.1, inplace=False)
)
(transformer): Transformer(
(layer): ModuleList(
(0): TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(1): TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(2): TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(3): TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(4): TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
(5): TransformerBlock(
(attention): MultiHeadSelfAttention(
(dropout): Dropout(p=0.1, inplace=False)
(q_lin): Linear(in_features=768, out_features=768, bias=True)
(k_lin): Linear(in_features=768, out_features=768, bias=True)
(v_lin): Linear(in_features=768, out_features=768, bias=True)
(out_lin): Linear(in_features=768, out_features=768, bias=True)
)
(sa_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
(ffn): FFN(
(dropout): Dropout(p=0.1, inplace=False)
(lin1): Linear(in_features=768, out_features=3072, bias=True)
(lin2): Linear(in_features=3072, out_features=768, bias=True)
)
(output_layer_norm): LayerNorm((768,), eps=1e-12, elementwise_affine=True)
)
)
)
)
(pre_classifier): Linear(in_features=768, out_features=768, bias=True)
(classifier): Linear(in_features=768, out_features=2, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
)
看看多了什么?前面我們說對每一個(gè)詞生成一個(gè)768向量,最后就連了兩個(gè)全連接層:
(pre_classifier): Linear(in_features=768, out_features=768, bias=True)
(classifier): Linear(in_features=768, out_features=2, bias=True)
(dropout): Dropout(p=0.2, inplace=False)
這個(gè)logits就是輸出結(jié)果了:
print(outputs.logits.shape)
torch.Size([2, 2])
這個(gè)2*2表示的就是樣本為2(兩個(gè)英語句子),分類是2分類,但是我們需要得到最后的分類概率,再加上softmax:
import torch
predictions = torch.nn.functional.softmax(outputs.logits, dim=-1)
print(predictions)
dim=-1就是沿著最后一個(gè)維度進(jìn)行計(jì)算,最后返回的就是概率值:
tensor([[1.5446e-02, 9.8455e-01], [9.9946e-01, 5.4418e-04]], grad_fn=SoftmaxBackward0)
概率知道了,類別的概率是什么呢?調(diào)一個(gè)內(nèi)置的id to label配置:
model.config.id2label
{0: 'NEGATIVE', 1: 'POSITIVE'}
也就是說,第一個(gè)句子負(fù)面情感的概率為1.54%,正面的概率情感為98.46%文章來源:http://www.zghlxwxcb.cn/news/detail-702790.html
下篇內(nèi)容:
Hugging Face實(shí)戰(zhàn)-系列教程4:padding與attention_mask
到了這里,關(guān)于Hugging Face實(shí)戰(zhàn)-系列教程3:AutoModelForSequenceClassification文本2分類的文章就介紹完了。如果您還想了解更多內(nèi)容,請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!