2022-kaggle-nlp賽事:Feedback Prize - English Language Learning
零、比賽介紹
比賽地址Feedback Prize - English Language Learning | Kaggle
0.1 比賽目標(biāo)
寫作是一項基本技能。可惜很少學(xué)生能夠磨練,因為學(xué)校很少布置寫作任務(wù)。學(xué)習(xí)英語作為第二語言的學(xué)生,即英語語言學(xué)習(xí)者(ELL, English Language Learners),尤其受到缺乏實踐的影響?,F(xiàn)有的工具無法根據(jù)學(xué)生的語言能力提供反饋,導(dǎo)致最終評估可能對學(xué)習(xí)者產(chǎn)生偏差。數(shù)據(jù)科學(xué)可夠改進(jìn)自動反饋工具,以更好地支持這些學(xué)習(xí)者的獨特需求。
本次比賽的目標(biāo)是評估8-12年級英語學(xué)習(xí)者(ELL,)的語言水平。利用ELLs寫的文章作為數(shù)據(jù)集,開發(fā)更好地支持所有學(xué)生寫作能力的模型。
本次比賽的評價指標(biāo)是MCRMSE,公式我就截圖放上來了
0.2 數(shù)據(jù)集
本次比賽數(shù)據(jù)集(ELLIPSE語料庫)包括8-12年級英語學(xué)習(xí)者(ELL)撰寫的議論文。論文根據(jù)六個分析指標(biāo)進(jìn)行評分:cohesion, syntax, vocabulary, phraseology, grammar, and conventions.(銜接、語法、詞匯、短語、語法和慣例)。分?jǐn)?shù)范圍從1.0到5.0,增量為0.5。得分越高,表示該能力越熟練。您的任務(wù)是預(yù)測測試集論文的六個指標(biāo)分?jǐn)?shù)。其中一些文章出現(xiàn)在 Feedback Prize - Evaluating Student Writing 和 Feedback Prize - Predicting Effective Arguments 的數(shù)據(jù)集中,歡迎您在本次比賽中使用這些早期數(shù)據(jù)集。
文件和字段:
train.csv:由唯一的text_id
標(biāo)識,full_text
字段表示文章全文,還有另外6個寫作評分指標(biāo)
test.csv:只有text_id
和full_text
字段,且只有三個測試樣本。
sample_submission.csv :提交文件范例
一、設(shè)置
1.1 導(dǎo)入相關(guān)庫
import os,gc,re,ast,sys,copy,json,time,datetime,math,string,pickle,random,joblib,itertools
from distutils.util import strtobool
'''
這段代碼使用 Python 的 warnings 模塊來控制警告信息的顯示。
第一行代碼 import warnings 導(dǎo)入了 Python 的 warnings 模塊,該模塊提供了用于處理警告信息的函數(shù)和類。
第二行代碼 warnings.filterwarnings('ignore') 調(diào)用了 warnings 模塊中的 filterwarnings() 函數(shù),用于控制警告信息的顯示。在這個例子中,傳遞給 filterwarnings() 函數(shù)的參數(shù)是 'ignore',表示忽略所有警告信息。
這段代碼的作用是忽略所有警告信息,使它們不會在程序運行時顯示出來。這在某些情況下可能會有用,例如當(dāng)你想要避免某些已知但無關(guān)緊要的警告信息干擾程序輸出時。
'''
import warnings
warnings.filterwarnings('ignore')
import scipy as sp
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from tqdm.auto import tqdm
from sklearn.metrics import mean_squared_error # 均方誤差(Mean Squared Error,MSE)
from sklearn.model_selection import StratifiedKFold, GroupKFold, KFold,train_test_split
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, Dataset
import torch.nn.functional as F
from torch.nn import Parameter
from torch.optim import Adam, SGD, AdamW
from torch.utils.checkpoint import checkpoint
import transformers, tokenizers
print(f'transformers.__version__: {transformers.__version__}')
print(f'tokenizers.__version__: {tokenizers.__version__}')
'''
第一行代碼 from transformers import AutoTokenizer, AutoModel, AutoConfig
導(dǎo)入了 transformers 庫中的三個類AutoTokenizer 類用于自動加載預(yù)訓(xùn)練的分詞器,
AutoModel類用于自動加載預(yù)訓(xùn)練的模型
AutoConfig 類用于自動加載模型的配置信息。
'''
from transformers import AutoTokenizer, AutoModel, AutoConfig
'''
這兩個函數(shù)都用于創(chuàng)建學(xué)習(xí)率調(diào)度器,分別使用線性預(yù)熱和余弦預(yù)熱策略。
'''
from transformers import get_linear_schedule_with_warmup, get_cosine_schedule_with_warmup
'''
transformers 庫中的分詞器在運行時會檢查 TOKENIZERS_PARALLELISM 環(huán)境變量的值
這個環(huán)境變量用于控制分詞器是否使用并行處理來加速分詞。將其設(shè)置為 'true' 表示啟用并行處理。
'''
os.environ['TOKENIZERS_PARALLELISM']='true'
transformers.__version__: 4.30.2
tokenizers.__version__: 0.13.3
1.2 設(shè)置超參數(shù)和隨機種子
class CFG:
str_now = datetime.datetime.now().strftime('%Y%m%d-%H%M')
model = 'deberta-v3-base'
model_path = '/hy-tmp/model' # 模型的路徑
batch_size, n_target, num_workers = 8, 6, 4
target_cols = ['cohesion', 'syntax', 'vocabulary', 'phraseology', 'grammar', 'conventions']
epoch, print_freq = 5, 20 # 訓(xùn)練時每擱20step打印一次
loss_func = 'RMSE' # 'SmoothL1', 'RMSE'
pooling = 'attention' # mean, max, min, attention, weightedlayer
gradient_checkpointing = True # 未知,不知道干嘛的
gradient_accumulation_steps = 1 # 是否使用梯度累計更新
max_grad_norm = 1000 # 梯度裁剪
apex = True # 是否進(jìn)行自動混合精度訓(xùn)練
scheduler = 'cosine'
# num_cycles:余弦周期數(shù),默認(rèn)為 0.5。表示在訓(xùn)練過程中余弦曲線的周期數(shù)
# num_warmup_steps:是一個用于控制學(xué)習(xí)率預(yù)熱的超參數(shù)。它表示在訓(xùn)練開始時,學(xué)習(xí)率預(yù)熱階段內(nèi)的步數(shù)。學(xué)習(xí)率預(yù)熱是一種常用的學(xué)習(xí)率調(diào)度策略,它在訓(xùn)練開始時逐漸增加學(xué)習(xí)率,直到達(dá)到最大值。這樣做的目的是為了在訓(xùn)練開始時避免使用過大的學(xué)習(xí)率,從而防止模型參數(shù)更新過快,導(dǎo)致不穩(wěn)定。
num_cycles, num_warmup_steps = 0.5, 0
encoder_lr, decoder_lr, min_lr = 2e-5, 2e-5, 1e-6
max_len = 512
weight_decay = 0.01 # 參數(shù)優(yōu)化器中需要權(quán)重衰退的參數(shù)的權(quán)重衰退超參數(shù)
fgm = True # 是否使用fgm對抗網(wǎng)絡(luò)攻擊
wandb = True # 是否啟用wandb
adv_lr, adv_eps, eps, betas = 1, 0.2, 1e-6, (0.9, 0.999) # 不知道啥用
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') # 如果有GPU環(huán)境就用GPU,否則用CPU
save_all_models = False # 是否每個epoch都保存數(shù)據(jù)
OUTPUT_DIR = f"/hy-tmp/{model}/"
train_file = '/hy-tmp/data/train.csv'
test_file = '/hy-tmp/data/test.csv'
submission_file = '/hy-tmp/data/sample_submission.csv'
if not os.path.exists(CFG.OUTPUT_DIR):
os.makedirs(CFG.OUTPUT_DIR)
CFG.OUTPUT_DIR
'/hy-tmp/deberta-v3-base/'
設(shè)置隨機種子,這樣每次運行結(jié)果都會是一樣的
def set_seeds(seed):
random.seed(seed) # 使用 Python 內(nèi)置的 random 模塊來設(shè)置隨機數(shù)生成器的種子
np.random.seed(seed) # 使用 NumPy 庫中的 random 模塊來設(shè)置隨機數(shù)生成器的種子
torch.manual_seed(seed) # 使用 PyTorch 庫中的 manual_seed 函數(shù)來設(shè)置隨機數(shù)生成器的種子(僅針對 CPU)
if torch.cuda.is_available():
'''
如果有可用的 GPU,則使用 PyTorch 庫中的 manual_seed 和 manual_seed_all 函數(shù)來分別為當(dāng)前 GPU 和所有 GPU 設(shè)置隨機數(shù)生成器的種子
'''
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)
torch.backends.cudnn.deterministic = True # 固定網(wǎng)絡(luò)結(jié)構(gòu)
set_seeds(1111)
這里單獨解釋一下torch.backends.cudnn.deterministic = True
,cudnn是一個用于加速GPU計算的庫。PyTorch 會自動檢測你的系統(tǒng)中是否安裝了支持的 cudnn
庫,并在使用 GPU 進(jìn)行計算時自動使用它來加速計算。它內(nèi)部會使用非確定性算法,在計算卷積的時候我們知道卷積核會與數(shù)據(jù)逐位相乘再累加。
這個時候累加的順序很重要,由于 GPU 中的線程是并行執(zhí)行的,每個線程都可能在不同的時間點對同一個位置進(jìn)行累加,每次運行代碼的時候,先計算完成的線程都會不同。即上圖的累加順序可能不是(0,0)+(0,1)···(2,2),也許這次執(zhí)行是(0,0)的位置先計算完并加到對應(yīng)output的位置上,下次就是(1,1)的位置先計算完然后加到output上去。
或許這在整數(shù)計算時累加順序的變化不影響最終的結(jié)果。但若計算的是浮點數(shù),在實際應(yīng)用中,由于浮點數(shù)運算具有一定的誤差(這部分不了解的同學(xué)另行百度一下),累加操作的順序可能會影響最終結(jié)果。而我們的目的是希望能夠復(fù)現(xiàn)結(jié)果,希望每次運行結(jié)果都是一樣的,這時就需要設(shè)置這個變量為True
,這樣pytorch就會以默認(rèn)的順序計算卷積和池化(或者說叫聚合)。設(shè)置了這個的好處是便于復(fù)現(xiàn),壞處是計算可能會慢一些。
二、 數(shù)據(jù)預(yù)處理
2.1 定義前處理函數(shù),tokenizer文本
為了將訓(xùn)練測試集都統(tǒng)一處理,測試集添加label=[0,0,0,0,0,0]
def preprocess(df, tokenizer, types=True):
# types主要用于判斷是否是訓(xùn)練集
if types:
labels = np.array(df[["cohesion", "syntax", "vocabulary", "phraseology", "grammar", "conventions"]]) # 返回的numpy數(shù)組
else:
labels = df['labels'] # 返回的dataframe,返回numpy數(shù)組也可以,返回什么無所謂,只要能用下標(biāo)進(jìn)行索引就行,實際上這個數(shù)據(jù)在后續(xù)不會使用到
text = list(df['full_text'].iloc[:])
encoding = tokenizer(text, truncation=True, padding='max_length', max_length=CFG.max_len, return_tensors='np')
return encoding, labels
df = pd.read_csv(CFG.train_file)
train_df, val_df = train_test_split(df[:100], test_size=0.2, random_state=1111, shuffle=True)
test_df = pd.read_csv(CFG.test_file)
test_df['labels'] = None # 新增一列數(shù)據(jù),主要是便于后續(xù)使用 Dataset
test_df['labels'] = test_df['labels'].apply(lambda x: [0, 0, 0, 0, 0, 0]) # 給測試數(shù)據(jù)新增一個長度為6的label列
tokenizer = AutoTokenizer.from_pretrained(CFG.model_path) # 自動加載tokenizer,該路徑下有deberta-v3-base的幾個文件
train_encoding, train_label = preprocess(train_df, tokenizer, True)
val_encoding, val_label = preprocess(val_df, tokenizer, True)
test_encoding, test_label = preprocess(test_df, tokenizer, False)
我的CFG.model_path
下有如圖中的內(nèi)容,下載路徑為:deberta-v3-base下載路徑,路徑里的tf_model.h5
是不需要下載的,因為它是tensorflow框架的模型。
2.2 定義Dataset,并將數(shù)據(jù)裝入DataLoader
from torch.utils.data import Dataset, DataLoader
class MyDataset(Dataset):
def __init__(self, encoding, label):
super(Dataset, self).__init__()
self.inputs = encoding
self.label = label
# 獲取數(shù)據(jù)的長度
def __len__(self):
return len(self.label)
# 獲取單個數(shù)據(jù)
def __getitem__(self, index):
'''
inputs中有三個字段 input_ids、token_type_ids、attention_mask。每個字段都是二維數(shù)組,行數(shù)為數(shù)據(jù)的條數(shù)
因此,我們?nèi)绻肴〕鱿聵?biāo)為index的數(shù)據(jù),則需要依次獲取三個字段,每次獲取到一個字段時選中下標(biāo)為index的數(shù)據(jù)
將選中的數(shù)據(jù)轉(zhuǎn)為tensor類型,因為后續(xù)需要在pytorch進(jìn)行計算的話必須是張量(tensor)。
'''
item = {key: torch.tensor(val[index], dtype=torch.long) for key, val in self.inputs.items()}
label = torch.tensor(self.label[index], dtype=torch.float)
return item, label
train_dataset = MyDataset(train_encoding, train_label)
val_dataset = MyDataset(val_encoding, val_label)
test_dataset = MyDataset(test_encoding, test_label)
train_loader = DataLoader(train_dataset, batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
val_loader = DataLoader(val_dataset, batch_size=CFG.batch_size, shuffle=True, num_workers=CFG.num_workers)
test_loader = DataLoader(test_dataset, batch_size=CFG.batch_size, shuffle=False, num_workers=CFG.num_workers) # 測試集一定不能打亂
此時我們來輸出一下test_loader
的第一個批次,需要注意的是,測試集一共就3行數(shù)據(jù),因此第一個批次也只有三行數(shù)據(jù)。
for i in test_loader:
print(i)
break
下面的輸出結(jié)果中,是一個字典包含了三個字段,而每個字段有三行數(shù)據(jù),這與我們上面寫的getitem
函數(shù)返回的內(nèi)容好像不大一致,因為該函數(shù)返回的是一個字典,如果我們通過這個函數(shù)獲得一個批次的數(shù)據(jù)的話,應(yīng)該得到的是一個數(shù)組里面包含了三個字典,而現(xiàn)在我們得到的是一個字典,字典中每個字段包含了三行數(shù)據(jù)。至于為什么接下來就會解釋。
[{'input_ids': tensor([[ 1, 335, 266, ..., 265, 262, 2],
[ 1, 771, 274, ..., 0, 0, 0],
[ 1, 2651, 9805, ..., 0, 0, 0]]), 'token_type_ids': tensor([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]), 'attention_mask': tensor([[1, 1, 1, ..., 1, 1, 1],
[1, 1, 1, ..., 0, 0, 0],
[1, 1, 1, ..., 0, 0, 0]])}, tensor([[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.],
[0., 0., 0., 0., 0., 0.]])]
當(dāng)我們使用 DataLoader
從自定義的 Dataset
中獲取一個批次的數(shù)據(jù)時,DataLoader
會調(diào)用其 collate_fn
函數(shù)將多個樣本組合成一個批次。默認(rèn)情況下,DataLoader
使用的是 default_collate
函數(shù),它能夠處理常見的數(shù)據(jù)類型,例如張量、列表和字典等。
如果我們的 Dataset
的 __getitem__
函數(shù)返回的是一個字典,那么當(dāng)使用 DataLoader
獲取一個批次的數(shù)據(jù)時,返回的數(shù)據(jù)將是一個字典,其中每個鍵對應(yīng)一個批次大小的張量。例如,如果__getitem__
函數(shù)返回如下字典:
{
"input_ids": torch.tensor([1, 2, 3]),
"attention_mask": torch.tensor([1, 1, 0])
}
那么當(dāng)使用 DataLoader
獲取一個批次大小為 2 的數(shù)據(jù)時,返回的數(shù)據(jù)可能如下所示:
{
"input_ids": torch.tensor([[1, 2, 3], [4, 5, 6]]),
"attention_mask": torch.tensor([[1, 1, 0], [1, 0, 0]])
}
其中,每個鍵對應(yīng)一個形狀為 (batch_size, ...)
的張量。
數(shù)據(jù)也讀好了,我們接下來寫一些輔助函數(shù),例如損失函數(shù),評價指標(biāo)函數(shù)等
三、輔助函數(shù)
定義RMSELoss、評價指標(biāo)MCRMSE分?jǐn)?shù)、logger、FGM等。
下面定義的RMSELoss是訓(xùn)練時使用的損失函數(shù),MCRMSE函數(shù)是使用驗證集時,評分的評分指標(biāo)。我們需要區(qū)分RMSELoss是用于梯度下降,反向傳播的損失函數(shù),而MCRMSE只是個評分指標(biāo),計算出來看的而已。
# 用于反向傳播的損失函數(shù)
class RMSELoss(nn.Module):
def __init__(self, reduction='mean', eps=1e-9):
super().__init__()
self.mse = nn.MSELoss(reduction='none')
self.reduction = reduction
self.eps = eps
def forward(self, y_pred, y_true):
loss = torch.sqrt(self.mse(y_pred, y_true) + self.eps)
if self.reduction == 'none':
loss = loss
elif self.reduction == 'sum':
loss = loss.sum()
elif self.reduction == 'mean':
loss = loss.mean()
return loss
# 用于評分的評分指標(biāo)
def MCRMSE(y_trues, y_preds):
scores = []
idxes = y_trues.shape[1]
for i in range(idxes):
y_true = y_trues[:, i] # 這是一個一維數(shù)組,我們把二維數(shù)組中的一列取了出來
y_pred = y_preds[:, i]
score = mean_squared_error(y_true, y_pred, squared=False) # 計算RMSE,均方根誤差(Root Mean Squared Error,RMSE)
scores.append(score)
mcrmse_score = np.mean(scores) # 計算MCRMSE
return mcrmse_score, scores
# 主要是方便計算損失
class AverageMeter(object):
def __init__(self):
self.reset()
def reset(self):
self.val = 0 # 記錄當(dāng)前一個batch的平均loss
self.avg = 0 # 當(dāng)前整輪的平均損失
self.sum = 0 # 整輪的損失和
self.count = 0 # 該輪共有多少數(shù)據(jù),便于通過損失和計算出平均損失
def update(self, val, n=1):
self.val = val
self.sum += val * n
self.count += n
self.avg = self.sum / self.count
# 將秒轉(zhuǎn)為分鐘-秒
def asMinutes(s):
m = math.floor(s / 60) # 下取整值
s -= m * 60
return f'{int(m)}m {int(s)}s'
# 計算剩余多少時間
def timeSince(since, percent):
now = time.time()
s = now - since # 從開始到現(xiàn)在過了多少秒
es = s / (percent) # 一輪訓(xùn)練需要的總的時間,percent是當(dāng)前 step 除以 dataloader 的長度————是一個小數(shù)
rs = es - s # 剩余時間
return f'{str(asMinutes(s))} (remain {str(asMinutes(rs))})'
def get_logger(filename=CFG.OUTPUT_DIR+'train'):
from logging import getLogger, INFO, StreamHandler, FileHandler, Formatter
logger = getLogger(__name__) # 創(chuàng)建了一個名為__name__的日志記錄器對象,并用setLevel(INFO)設(shè)置了它的日志級別為INFO,表示只記錄INFO級別及以上的信息。
logger.setLevel(INFO)
handler1 = StreamHandler() # 流處理器
handler1.setFormatter(Formatter("%(message)s")) # 設(shè)計顯示格式
handler2 = FileHandler(filename=f"{filename}.log") # 文件流處理器
handler2.setFormatter(Formatter("%(message)s")) # 設(shè)計顯示格式
logger.addHandler(handler1) # 輸出到屏幕
logger.addHandler(handler2) # 輸出到文件
return logger
logger= get_logger()
logger
Fast Gradient Method (FGM)
使用pytorch在NLP中實現(xiàn)對抗訓(xùn)練的教程可以看這個視頻# NLP中的對抗訓(xùn)練
class FGM():
def __init__(self, model):
self.model = model
self.backup = {}
def attack(self, epsilon = 1., emb_name = 'word_embeddings'):
for name, param in self.model.named_parameters():
if param.requires_grad and emb_name in name:
self.backup[name] = param.data.clone()
norm = torch.norm(param.grad)
if norm != 0:
r_at = epsilon * param.grad / norm
param.data.add_(r_at)
def restore(self, emb_name = 'word_embeddings'):
for name, param in self.model.named_parameters():
if param.requires_grad and emb_name in name:
assert name in self.backup
param.data = self.backup[name]
self.backup = {}
四、池化
#Attention pooling
class AttentionPooling(nn.Module):
# 拋開batch維度不看,輸入的是(sequence_length, hidden_size),即每個詞都有一個詞向量,然后詞向量的個數(shù)有sequence_length個
# 注意力池化會根據(jù)attention_mask來調(diào)整每個詞的權(quán)重,然后根據(jù)權(quán)重把所有的詞向量加權(quán)求和,從而得到一個考慮到了每個詞的句向量
def __init__(self, in_dim):
super().__init__()
self.attention = nn.Sequential(
nn.Linear(in_dim, in_dim), # (batch_size, sequence_length, hidden_size),對模型隱藏層的最后一層做一次線性變換
nn.LayerNorm(in_dim), # 歸一化,它可以使中間層的分布更加穩(wěn)定,從而使梯度更加平滑,訓(xùn)練更快,泛化能力更強。12
nn.GELU(), # 激活函數(shù)
nn.Linear(in_dim, 1), # (batch_size, sequence_length, 1),對于每個詞都有一個權(quán)重,詞的個數(shù)就是句子的長度,然后權(quán)重只是一個值,所以大小就如前面所說
)
def forward(self, last_hidden_state, attention_mask):
w = self.attention(last_hidden_state).float() # 計算每個token對句子的貢獻(xiàn)
# w的大小為(batch_size, sequence_length, 1)
w[attention_mask==0]=float('-inf') # 把填充的部分的權(quán)重調(diào)整為負(fù)無窮,這樣在softmax后值就為0了,表示不用注意
w = torch.softmax(w,1) # 進(jìn)行softmax,使得所有token位置的權(quán)重總和為1
# 對第二維度,即sequence_length進(jìn)行求和,其實就是根據(jù)w把每個詞向量加權(quán)求和得到句向量
attention_embeddings = torch.sum(w * last_hidden_state, dim=1) # (batch_size, hidden_size),語義信息,hidden_size表示句向量
# 返回一個batch的句向量
return attention_embeddings
五、模型搭建
class FB3Model(nn.Module):
def __init__(self, CFG, config_path=None, pretrained=False):
super().__init__()
self.CFG = CFG
# 加載配置文件
if config_path is None:
self.config = AutoConfig.from_pretrained(CFG.model_path, ouput_hidden_states = True)
self.config.save_pretrained(CFG.OUTPUT_DIR + 'config')
self.config.hidden_dropout = 0. # 這個參數(shù)表示隱藏層的dropout概率,也就是隱藏層的神經(jīng)元有多少比例會被隨機關(guān)閉,以防止過擬合。一般來說,這個參數(shù)的默認(rèn)值是0.1,設(shè)置為0表示不使用dropout。
self.config.hidden_dropout_prob = 0. # 這個參數(shù)和上面的參數(shù)是一樣的,只是名字不同
self.config.attention_dropout = 0. # 這個參數(shù)表示注意力機制的dropout概率,也就是注意力矩陣中有多少比例的元素會被隨機置為0,以防止過擬合。一般來說,這個參數(shù)的默認(rèn)值也是0.1,設(shè)置為0表示不使用dropout。
self.config.attention_probs_dropout_prob = 0. # 這個參數(shù)和上面的參數(shù)是一樣的,只是名字不同
logger.info(self.config)
else:
self.config = torch.load(config_path)
# 加載預(yù)訓(xùn)練模型
if pretrained:
self.model = AutoModel.from_pretrained(CFG.model_path, config=self.config)
else:
self.model = AutoModel.from_config(self.config)
# 設(shè)置池化方式
if CFG.pooling == 'attention':
self.pool = AttentionPooling(self.config.hidden_size)
self.fc = nn.Linear(self.config.hidden_size, self.CFG.n_targets)
def forward(self, inputs):
outputs = self.model(**inputs) # inputs內(nèi)部
outputs = self.pool(outputs[1]) # 先池化得到句向量,這里的outputs[1]就是最后一層隱藏層的狀態(tài),大小為(batch_size, sequence_length, hidden_size)
output = self.fc(outputs) # 對句向量進(jìn)行6分類
return output
model = FB3Model(CFG, config_path=None, pretrained=True)
torch.save(model.config, './config.pth')
model.to(device)
六、定義訓(xùn)練函數(shù)與驗證函數(shù)
6.1 定義參數(shù)優(yōu)化器與學(xué)習(xí)率優(yōu)化器
關(guān)于是使用linear
還是cosine
的學(xué)習(xí)率優(yōu)化器,原理和區(qū)別,可以看這幾篇文章,# Transformers之自定義學(xué)習(xí)率動態(tài)調(diào)整
def get_optimizer_params(model,encoder_lr,decoder_lr,weight_decay=0.0):
param_optimizer = list(model.named_parameters())
no_decay = ["bias", "LayerNorm.bias", "LayerNorm.weight"]
optimizer_parameters = [
{'params': [p for n, p in model.model.named_parameters() if not any(nd in n for nd in no_decay)],
'lr': encoder_lr,
'weight_decay': weight_decay},
{'params': [p for n, p in model.model.named_parameters() if any(nd in n for nd in no_decay)],
'lr': encoder_lr,
'weight_decay': 0.0},
{'params': [p for n, p in model.named_parameters() if "model" not in n],
'lr': decoder_lr,
'weight_decay': 0.0}
]
return optimizer_parameters
# 選擇使用線性學(xué)習(xí)率衰減或者cos學(xué)習(xí)率衰減
def get_scheduler(cfg, optimizer, num_train_steps):
if cfg.scheduler == 'linear':
scheduler = get_linear_schedule_with_warmup(
optimizer,
num_warmup_steps = cfg.num_warmup_steps,
num_training_steps = num_train_steps
)
elif cfg.scheduler == 'cosine':
scheduler = get_cosine_schedule_with_warmup(
optimizer,
num_warmup_steps = cfg.num_warmup_steps,
num_training_steps = num_train_steps,
num_cycles = cfg.num_cycles
)
return scheduler
from torch.optim import AdamW
# 下面是參數(shù)優(yōu)化的優(yōu)化器
optimizer_parameters = get_optimizer_params(model,CFG.encoder_lr, CFG.decoder_lr,CFG.weight_decay) # 自己定義參數(shù)
optimizer = AdamW(optimizer_parameters, lr=CFG.encoder_lr, eps=CFG.eps,betas=CFG.betas) # 定義優(yōu)化器,學(xué)習(xí)率我們自己定義了一些層的學(xué)習(xí)率,如果沒有定義的會用此處設(shè)置的學(xué)習(xí)率
# 下面是學(xué)習(xí)率衰退的優(yōu)化器
num_train_steps = len(train_loader) * CFG.epochs # 訓(xùn)練步數(shù),計算有多少個batch
scheduler = get_scheduler(CFG, optimizer, num_train_steps) # 定義學(xué)習(xí)率優(yōu)化器,此處需要傳入總的 batch 的個數(shù)
if CFG.loss_func == 'SmoothL1':
criterion = nn.SmoothL1Loss(reduction='mean')
elif CFG.loss_func == 'RMSE':
criterion = RMSELoss(reduction='mean')
6.2 定義訓(xùn)練函數(shù)和評估函數(shù)
函數(shù)中涉及的scaler,是自動混合精度訓(xùn)練的實例化。可以通過這篇文章學(xué)習(xí)# PyTorch的自動混合精度(AMP)
以前沒怎么使用過scheduler的我,看到了scheduler.step()就有點懵了,于是看了下面的文章。了解了optimizer.step()和scheduler.step()的關(guān)系。# PyTorch中的optimizer和scheduler
def train_fn(train_loader, model, criterion, optimizer, epoch, scheduler, device):
losses = AverageMeter()
model.train() # 設(shè)置成訓(xùn)練模式
scaler = torch.cuda.amp.GradScaler(enabled = CFG.apex) # 自動混合精度訓(xùn)練
start = end = time.time()
global_step = 0
if CFG.fgm:
fgm = FGM(model) # 對抗訓(xùn)練
for step, (inputs, labels) in enumerate(train_loader):
# 不是很清楚字典能不能直接 to(device)
for k, v in inputs.items():
inputs[k] = v.to(device)
labels = labels.to(device)
batch_size = labels.size(0) # labels是一個尺寸為 [N, 1] 的張量,size()可以獲取某個維度上的大小,此處獲取了就是 N,主要是不知道最后一個 batch 有多大,所以要動態(tài)的獲取
with torch.cuda.amp.autocast(enabled = CFG.apex):
y_preds = model(inputs) # 獲取預(yù)測值
loss = criterion(y_preds, labels) # 計算損失
if CFG.gradient_accumulation_steps > 1:
loss = loss / CFG.gradient_accumulation_steps
losses.update(loss.item(), batch_size) # 重新計算總體的均方誤差
scaler.scale(loss).backward() # 自動混合精度(AMP)反向傳播
grad_norm = torch.nn.utils.clip_grad_norm_(model.parameters(), CFG.max_grad_norm) # 梯度裁剪,而且這是原地執(zhí)行的,獲取grad_norm只是為了輸出出來看看結(jié)果
#Fast Gradient Method (FGM)
if CFG.fgm:
fgm.attack() # 更新一下embedding
with torch.cuda.amp.autocast(enabled = CFG.apex):
y_preds = model(inputs) # 用受攻擊后的embedding重新訓(xùn)練
loss_adv = criterion(y_preds, labels)
loss_adv.backward() # 計算新的梯度,這個梯度是跟之前未受攻擊的梯度進(jìn)行累加的
fgm.restore() # 恢復(fù)之前的embedding
if (step + 1) % CFG.gradient_accumulation_steps == 0:
# 下面兩行就是更新參數(shù),等同于optimizer.step()
scaler.step(optimizer)# 如果梯度的值不是 infs 或者 NaNs, 那么內(nèi)部會調(diào)用optimizer.step()來更新權(quán)重,否則,忽略step調(diào)用,從而保證權(quán)重不更新(不被破壞)
scaler.update() # 更新scaler的大小
optimizer.zero_grad() # 清空梯度
global_step += 1
scheduler.step() # 更新學(xué)習(xí)率,學(xué)習(xí)率的更新與梯度沒有關(guān)系,所以先清空了梯度再更新學(xué)習(xí)率也沒關(guān)系
end = time.time()
if step % CFG.print_freq == 0 or step == (len(train_loader) - 1):
print('Epoch: [{0}][{1}/{2}] '
'Elapsed {remain:s} '
'Loss: {loss.val:.4f}({loss.avg:.4f}) '
'Grad: {grad_norm:.4f} '
'LR: {lr:.8f} '
.format(epoch + 1, step, len(train_loader), remain = timeSince(start, float(step + 1)/len(train_loader)),
loss = losses,
grad_norm = grad_norm,
lr = scheduler.get_lr()[0]))
return losses.avg
驗證函數(shù)
def valid_fn(valid_loader, model, criterion, device):
losses = AverageMeter()
model.eval() # 設(shè)置成測試模式
preds ,targets= [],[]
start = end = time.time()
for step, (inputs, labels) in enumerate(valid_loader):
for k, v in inputs.items():
inputs[k] = v.to(device)
labels = labels.to(device)
batch_size = labels.size(0)
with torch.no_grad(): # 測試模式,不能進(jìn)行梯度計算
y_preds = model(inputs)
loss = criterion(y_preds, labels)
if CFG.gradient_accumulation_steps > 1:
loss = loss / CFG.gradient_accumulation_steps
losses.update(loss.item(), batch_size)
preds.append(y_preds.to('cpu').numpy())
targets.append(labels.to('cpu').numpy())
end = time.time()
if step % CFG.print_freq == 0 or step == (len(valid_loader)-1):
print('EVAL: [{0}/{1}] '
'Elapsed {remain:s} '
'Loss: {loss.val:.4f}({loss.avg:.4f}) '
.format(step, len(valid_loader),
loss=losses,
remain=timeSince(start, float(step+1)/len(valid_loader))))
predictions = np.concatenate(preds)
targets=np.concatenate(targets)
return losses.avg, predictions,targets
總的訓(xùn)練函數(shù)
def train_loop():
best_score = np.inf
for epoch in range(CFG.epoch):
start_time = time.time()
logger.info(f"========== epoch: {epoch} training ==========")
avg_loss = train_fn(train_loader, model, criterion, optimizer, epoch, scheduler, CFG.device)
avg_val_loss, predictions,valid_labels = valid_fn(val_loader, model, criterion, CFG.device)
score, scores = MCRMSE(valid_labels, predictions) # 獲取所有指標(biāo)的平均得分以及每個指標(biāo)單獨的得分
elapsed = time.time() - start_time
logger.info(f'Epoch {epoch+1} - avg_train_loss: {avg_loss:.4f} avg_val_loss: {avg_val_loss:.4f} time: {elapsed:.0f}s')
logger.info(f'Epoch {epoch+1} - Score: {score:.4f} Scores: {scores}')
# 如果最新的分?jǐn)?shù)更高,則保存這個更優(yōu)的模型
if best_score > score:
best_score = score
logger.info(f'Epoch {epoch+1} - Save Best Score: {best_score:.4f} Model')
torch.save({'model': model.state_dict(),
'predictions': predictions},
CFG.OUTPUT_DIR + "_best.pth")
# 如果設(shè)置了保存每個epoch的模型,則每個模型都會被保存
if CFG.save_all_models:
torch.save({'model': model.state_dict(),
'predictions': predictions},
CFG.OUTPUT_DIR + "_epoch{epoch + 1}.pth")
調(diào)用一下
train_loop()
========== epoch: 0 training ==========
Epoch: [1][0/391] Elapsed 0m 0s (remain 5m 3s) Loss: 1.9498(1.9498) Grad: inf LR: 0.00002000
Epoch: [1][20/391] Elapsed 0m 9s (remain 2m 40s) Loss: 0.5507(0.9485) Grad: 88425.0312 LR: 0.00001999
Epoch: [1][40/391] Elapsed 0m 17s (remain 2m 29s) Loss: 0.4651(0.7112) Grad: 125639.5469 LR: 0.00001997
Epoch: [1][60/391] Elapsed 0m 25s (remain 2m 20s) Loss: 0.3574(0.6195) Grad: 48965.1836 LR: 0.00001993
Epoch: [1][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.4714(0.5734) Grad: 74479.8281 LR: 0.00001989
Epoch: [1][100/391] Elapsed 0m 42s (remain 2m 2s) Loss: 0.4562(0.5377) Grad: 114286.3984 LR: 0.00001984
Epoch: [1][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.4210(0.5139) Grad: 73615.7266 LR: 0.00001978
Epoch: [1][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.4222(0.5011) Grad: 124342.8672 LR: 0.00001971
Epoch: [1][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3030(0.4896) Grad: 110364.0078 LR: 0.00001962
Epoch: [1][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.5538(0.4809) Grad: 106854.4375 LR: 0.00001953
Epoch: [1][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.4477(0.4722) Grad: 269420.5938 LR: 0.00001943
Epoch: [1][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.3516(0.4660) Grad: 80327.8438 LR: 0.00001932
Epoch: [1][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.3896(0.4609) Grad: 63723.8398 LR: 0.00001920
Epoch: [1][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.3637(0.4565) Grad: 57669.6133 LR: 0.00001907
Epoch: [1][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.3613(0.4510) Grad: 113929.4141 LR: 0.00001893
Epoch: [1][300/391] Elapsed 2m 6s (remain 0m 37s) Loss: 0.4049(0.4482) Grad: 126043.1719 LR: 0.00001878
Epoch: [1][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.3543(0.4435) Grad: 114538.3203 LR: 0.00001862
Epoch: [1][340/391] Elapsed 2m 23s (remain 0m 21s) Loss: 0.3910(0.4410) Grad: 56835.2969 LR: 0.00001845
Epoch: [1][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.3834(0.4390) Grad: 104753.5312 LR: 0.00001827
Epoch: [1][380/391] Elapsed 2m 40s (remain 0m 4s) Loss: 0.2973(0.4361) Grad: 112279.6641 LR: 0.00001809
Epoch: [1][390/391] Elapsed 2m 44s (remain 0m 0s) Loss: 0.3189(0.4346) Grad: 102051.1328 LR: 0.00001799
EVAL: [0/98] Elapsed 0m 0s (remain 0m 59s) Loss: 0.3986(0.3986)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.3622(0.3786)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3669(0.3765)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.3587(0.3800)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.2795(0.3794)
Epoch 1 - avg_train_loss: 0.4346 avg_val_loss: 0.3815 time: 190s
Epoch 1 - Score: 0.4793 Scores: [0.50504076, 0.4751229, 0.43030012, 0.47523162, 0.530547, 0.4594006]
Epoch 1 - Save Best Score: 0.4793 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.5805(0.3815)
========== epoch: 1 training ==========
Epoch: [2][0/391] Elapsed 0m 0s (remain 5m 4s) Loss: 0.2979(0.2979) Grad: inf LR: 0.00001799
Epoch: [2][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.4277(0.3779) Grad: 126436.4922 LR: 0.00001779
Epoch: [2][40/391] Elapsed 0m 17s (remain 2m 29s) Loss: 0.4425(0.3725) Grad: 100469.0000 LR: 0.00001758
Epoch: [2][60/391] Elapsed 0m 25s (remain 2m 20s) Loss: 0.3419(0.3731) Grad: 86438.8672 LR: 0.00001737
Epoch: [2][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.2846(0.3727) Grad: 113561.5391 LR: 0.00001715
Epoch: [2][100/391] Elapsed 0m 42s (remain 2m 2s) Loss: 0.4614(0.3782) Grad: 71337.5234 LR: 0.00001692
Epoch: [2][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.4372(0.3745) Grad: 128858.8438 LR: 0.00001668
Epoch: [2][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.3464(0.3741) Grad: 92793.8203 LR: 0.00001644
Epoch: [2][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3312(0.3708) Grad: 82513.1406 LR: 0.00001619
Epoch: [2][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.3312(0.3705) Grad: 201764.6094 LR: 0.00001594
Epoch: [2][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.4366(0.3740) Grad: 135784.6406 LR: 0.00001567
Epoch: [2][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.2683(0.3761) Grad: 140406.2031 LR: 0.00001541
Epoch: [2][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.4170(0.3754) Grad: 136473.2344 LR: 0.00001513
Epoch: [2][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.3923(0.3753) Grad: 73960.7891 LR: 0.00001486
Epoch: [2][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.3683(0.3752) Grad: 64848.6758 LR: 0.00001457
Epoch: [2][300/391] Elapsed 2m 7s (remain 0m 38s) Loss: 0.3474(0.3744) Grad: 131543.4062 LR: 0.00001428
Epoch: [2][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.3071(0.3739) Grad: 104604.7188 LR: 0.00001399
Epoch: [2][340/391] Elapsed 2m 24s (remain 0m 21s) Loss: 0.2994(0.3751) Grad: 76737.6875 LR: 0.00001369
Epoch: [2][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.4693(0.3759) Grad: 121103.0078 LR: 0.00001339
Epoch: [2][380/391] Elapsed 2m 41s (remain 0m 4s) Loss: 0.3609(0.3750) Grad: 103191.3594 LR: 0.00001309
Epoch: [2][390/391] Elapsed 2m 45s (remain 0m 0s) Loss: 0.3236(0.3746) Grad: 115441.6875 LR: 0.00001294
EVAL: [0/98] Elapsed 0m 0s (remain 1m 1s) Loss: 0.3850(0.3850)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.4035(0.3799)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.4204(0.3744)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.3059(0.3784)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.3372(0.3768)
Epoch 2 - avg_train_loss: 0.3746 avg_val_loss: 0.3800 time: 190s
Epoch 2 - Score: 0.4767 Scores: [0.49304673, 0.46059012, 0.44865215, 0.4811058, 0.48887262, 0.4878879]
Epoch 2 - Save Best Score: 0.4767 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3070(0.3800)
========== epoch: 2 training ==========
Epoch: [3][0/391] Elapsed 0m 0s (remain 5m 1s) Loss: 0.2899(0.2899) Grad: inf LR: 0.00001292
Epoch: [3][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.3787(0.3540) Grad: 175920.7812 LR: 0.00001261
Epoch: [3][40/391] Elapsed 0m 17s (remain 2m 30s) Loss: 0.2569(0.3505) Grad: 96343.5703 LR: 0.00001230
Epoch: [3][60/391] Elapsed 0m 26s (remain 2m 20s) Loss: 0.4611(0.3546) Grad: 181476.2188 LR: 0.00001199
Epoch: [3][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.3593(0.3523) Grad: 71936.0391 LR: 0.00001167
Epoch: [3][100/391] Elapsed 0m 42s (remain 2m 3s) Loss: 0.3393(0.3559) Grad: 109394.1875 LR: 0.00001135
Epoch: [3][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.3660(0.3575) Grad: 140890.3906 LR: 0.00001103
Epoch: [3][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.3748(0.3594) Grad: 123294.2266 LR: 0.00001071
Epoch: [3][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3356(0.3578) Grad: 129286.8594 LR: 0.00001039
Epoch: [3][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.3279(0.3543) Grad: 134369.8438 LR: 0.00001007
Epoch: [3][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.3662(0.3531) Grad: 109224.3125 LR: 0.00000975
Epoch: [3][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.3806(0.3512) Grad: 119248.4375 LR: 0.00000943
Epoch: [3][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.3819(0.3523) Grad: 78124.8984 LR: 0.00000911
Epoch: [3][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.3427(0.3530) Grad: 73294.2891 LR: 0.00000879
Epoch: [3][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.3162(0.3519) Grad: 94840.5938 LR: 0.00000847
Epoch: [3][300/391] Elapsed 2m 6s (remain 0m 37s) Loss: 0.3040(0.3529) Grad: 88447.9453 LR: 0.00000815
Epoch: [3][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.3175(0.3526) Grad: 145305.6250 LR: 0.00000784
Epoch: [3][340/391] Elapsed 2m 23s (remain 0m 21s) Loss: 0.3468(0.3517) Grad: 110218.4531 LR: 0.00000753
Epoch: [3][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.3926(0.3522) Grad: 65352.9297 LR: 0.00000722
Epoch: [3][380/391] Elapsed 2m 40s (remain 0m 4s) Loss: 0.2828(0.3515) Grad: 105423.3984 LR: 0.00000691
Epoch: [3][390/391] Elapsed 2m 44s (remain 0m 0s) Loss: 0.3242(0.3506) Grad: 59486.8477 LR: 0.00000676
EVAL: [0/98] Elapsed 0m 0s (remain 1m 0s) Loss: 0.3660(0.3660)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.3246(0.3523)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3867(0.3606)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.2762(0.3658)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.4370(0.3674)
Epoch 3 - avg_train_loss: 0.3506 avg_val_loss: 0.3708 time: 190s
Epoch 3 - Score: 0.4654 Scores: [0.48894468, 0.47190648, 0.41822833, 0.47279745, 0.478483, 0.46188158]
Epoch 3 - Save Best Score: 0.4654 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3958(0.3708)
========== epoch: 3 training ==========
Epoch: [4][0/391] Elapsed 0m 0s (remain 5m 27s) Loss: 0.2887(0.2887) Grad: 248518.7812 LR: 0.00000674
Epoch: [4][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.4382(0.3248) Grad: 121720.1172 LR: 0.00000644
Epoch: [4][40/391] Elapsed 0m 17s (remain 2m 30s) Loss: 0.3732(0.3206) Grad: 125786.2812 LR: 0.00000614
Epoch: [4][60/391] Elapsed 0m 26s (remain 2m 20s) Loss: 0.3820(0.3343) Grad: 91117.5938 LR: 0.00000585
Epoch: [4][80/391] Elapsed 0m 34s (remain 2m 12s) Loss: 0.2734(0.3306) Grad: 57463.1992 LR: 0.00000556
Epoch: [4][100/391] Elapsed 0m 42s (remain 2m 3s) Loss: 0.2757(0.3275) Grad: 70494.5469 LR: 0.00000527
Epoch: [4][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.3366(0.3275) Grad: 87479.7891 LR: 0.00000499
Epoch: [4][140/391] Elapsed 0m 59s (remain 1m 46s) Loss: 0.3645(0.3282) Grad: 154315.6562 LR: 0.00000472
Epoch: [4][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.2984(0.3261) Grad: 93636.7109 LR: 0.00000445
Epoch: [4][180/391] Elapsed 1m 16s (remain 1m 29s) Loss: 0.2320(0.3264) Grad: 37266.0586 LR: 0.00000418
Epoch: [4][200/391] Elapsed 1m 25s (remain 1m 20s) Loss: 0.2876(0.3264) Grad: 71387.4922 LR: 0.00000392
Epoch: [4][220/391] Elapsed 1m 33s (remain 1m 12s) Loss: 0.3872(0.3260) Grad: 159705.8438 LR: 0.00000367
Epoch: [4][240/391] Elapsed 1m 42s (remain 1m 3s) Loss: 0.3811(0.3270) Grad: 47979.0312 LR: 0.00000342
Epoch: [4][260/391] Elapsed 1m 50s (remain 0m 55s) Loss: 0.2687(0.3269) Grad: 97840.6406 LR: 0.00000319
Epoch: [4][280/391] Elapsed 1m 59s (remain 0m 46s) Loss: 0.3409(0.3280) Grad: 173353.0156 LR: 0.00000295
Epoch: [4][300/391] Elapsed 2m 7s (remain 0m 38s) Loss: 0.3019(0.3289) Grad: 135909.9062 LR: 0.00000273
Epoch: [4][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.2928(0.3283) Grad: 69503.2734 LR: 0.00000251
Epoch: [4][340/391] Elapsed 2m 24s (remain 0m 21s) Loss: 0.3893(0.3289) Grad: 85478.1719 LR: 0.00000230
Epoch: [4][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.2778(0.3293) Grad: 120143.7656 LR: 0.00000210
Epoch: [4][380/391] Elapsed 2m 41s (remain 0m 4s) Loss: 0.3321(0.3280) Grad: 112291.6406 LR: 0.00000191
Epoch: [4][390/391] Elapsed 2m 45s (remain 0m 0s) Loss: 0.2963(0.3275) Grad: 108092.1172 LR: 0.00000182
EVAL: [0/98] Elapsed 0m 0s (remain 1m 3s) Loss: 0.2932(0.2932)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.3838(0.3549)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3059(0.3605)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.4385(0.3685)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.3533(0.3697)
Epoch 4 - avg_train_loss: 0.3275 avg_val_loss: 0.3693 time: 191s
Epoch 4 - Score: 0.4611 Scores: [0.48614553, 0.4448405, 0.42011937, 0.47291633, 0.4917833, 0.45061582]
Epoch 4 - Save Best Score: 0.4611 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3778(0.3693)
========== epoch: 4 training ==========
Epoch: [5][0/391] Elapsed 0m 0s (remain 5m 11s) Loss: 0.3832(0.3832) Grad: inf LR: 0.00000181
Epoch: [5][20/391] Elapsed 0m 9s (remain 2m 41s) Loss: 0.2926(0.3217) Grad: 94323.1406 LR: 0.00000163
Epoch: [5][40/391] Elapsed 0m 17s (remain 2m 30s) Loss: 0.3275(0.3139) Grad: 108386.5547 LR: 0.00000146
Epoch: [5][60/391] Elapsed 0m 25s (remain 2m 20s) Loss: 0.3118(0.3121) Grad: 89941.1016 LR: 0.00000129
Epoch: [5][80/391] Elapsed 0m 34s (remain 2m 11s) Loss: 0.3067(0.3132) Grad: 165197.1406 LR: 0.00000114
Epoch: [5][100/391] Elapsed 0m 42s (remain 2m 2s) Loss: 0.3166(0.3114) Grad: 56191.2539 LR: 0.00000100
Epoch: [5][120/391] Elapsed 0m 51s (remain 1m 54s) Loss: 0.3278(0.3124) Grad: 94895.2734 LR: 0.00000086
Epoch: [5][140/391] Elapsed 0m 59s (remain 1m 45s) Loss: 0.3607(0.3128) Grad: 77948.6484 LR: 0.00000073
Epoch: [5][160/391] Elapsed 1m 8s (remain 1m 37s) Loss: 0.3402(0.3146) Grad: 113676.4844 LR: 0.00000062
Epoch: [5][180/391] Elapsed 1m 16s (remain 1m 28s) Loss: 0.3783(0.3133) Grad: 56143.0781 LR: 0.00000051
Epoch: [5][200/391] Elapsed 1m 24s (remain 1m 20s) Loss: 0.3733(0.3138) Grad: 80444.6562 LR: 0.00000042
Epoch: [5][220/391] Elapsed 1m 33s (remain 1m 11s) Loss: 0.3398(0.3139) Grad: 107842.5391 LR: 0.00000033
Epoch: [5][240/391] Elapsed 1m 41s (remain 1m 3s) Loss: 0.3067(0.3155) Grad: 119173.0391 LR: 0.00000025
Epoch: [5][260/391] Elapsed 1m 50s (remain 0m 54s) Loss: 0.2778(0.3151) Grad: 51814.4180 LR: 0.00000019
Epoch: [5][280/391] Elapsed 1m 58s (remain 0m 46s) Loss: 0.2593(0.3147) Grad: 122930.2344 LR: 0.00000013
Epoch: [5][300/391] Elapsed 2m 7s (remain 0m 38s) Loss: 0.3550(0.3154) Grad: 145206.0312 LR: 0.00000008
Epoch: [5][320/391] Elapsed 2m 15s (remain 0m 29s) Loss: 0.2629(0.3145) Grad: 94886.2891 LR: 0.00000005
Epoch: [5][340/391] Elapsed 2m 24s (remain 0m 21s) Loss: 0.3118(0.3141) Grad: 71085.5234 LR: 0.00000002
Epoch: [5][360/391] Elapsed 2m 32s (remain 0m 12s) Loss: 0.2707(0.3133) Grad: 113675.3047 LR: 0.00000001
Epoch: [5][380/391] Elapsed 2m 40s (remain 0m 4s) Loss: 0.3170(0.3126) Grad: 110467.9141 LR: 0.00000000
Epoch: [5][390/391] Elapsed 2m 45s (remain 0m 0s) Loss: 0.3775(0.3130) Grad: 110086.3047 LR: 0.00000000
EVAL: [0/98] Elapsed 0m 0s (remain 0m 59s) Loss: 0.3563(0.3563)
EVAL: [20/98] Elapsed 0m 5s (remain 0m 20s) Loss: 0.4007(0.3810)
EVAL: [40/98] Elapsed 0m 10s (remain 0m 14s) Loss: 0.3444(0.3758)
EVAL: [60/98] Elapsed 0m 15s (remain 0m 9s) Loss: 0.3400(0.3755)
EVAL: [80/98] Elapsed 0m 20s (remain 0m 4s) Loss: 0.2994(0.3713)
Epoch 5 - avg_train_loss: 0.3130 avg_val_loss: 0.3677 time: 190s
Epoch 5 - Score: 0.4603 Scores: [0.485256, 0.44779205, 0.4199208, 0.47600335, 0.4788845, 0.45419946]
Epoch 5 - Save Best Score: 0.4603 Model
EVAL: [97/98] Elapsed 0m 24s (remain 0m 0s) Loss: 0.3803(0.3677)
七、推理
這個部分比賽并沒有提供標(biāo)準(zhǔn)的label,所以我們?nèi)绻皇亲约壕€下寫著玩,那么這一部分就沒意義了。
def inference_fn(test_loader, model, device):
preds = []
model.eval()
model.to(device)
tk0 = tqdm(test_loader, total=len(test_loader))
for inputs,label in tk0:
for k, v in inputs.items():
inputs[k] = v.to(device)
with torch.no_grad():
y_preds = model(inputs)
preds.append(y_preds.to('cpu').numpy())
predictions = np.concatenate(preds)
return predictions
def inference_fn(test_loader, model, device):
preds = []
model.eval()
model.to(device)
tk0 = tqdm(test_loader, total=len(test_loader))
for inputs,label in tk0:
for k, v in inputs.items():
inputs[k] = v.to(device)
with torch.no_grad():
y_preds = model(inputs)
preds.append(y_preds.to('cpu').numpy())
predictions = np.concatenate(preds)
return predictions
輸出結(jié)果為
array([[ 0.06905353, 0.16151421, -0.7520439 , -0.05804106, 0.86029375,
0.68256676],
[-0.10824093, -0.08520262, -0.72831357, -0.0021437 , 0.7458864 ,
0.6492575 ],
[ 0.07650095, 0.3073048 , -0.8738065 , 0.03434162, 0.63522017,
0.57341987]], dtype=float32)
輸出結(jié)果文件文章來源:http://www.zghlxwxcb.cn/news/detail-676101.html
test_df[CFG.target_cols] = prediction
submission = submission.drop(columns=CFG.target_cols).merge(test_df[['text_id'] + CFG.target_cols], on='text_id', how='left')
display(submission.head())
submission[['text_id'] + CFG.target_cols].to_csv('submission.csv', index=False)
我自己也是個學(xué)習(xí)者,因此如果對代碼有疑問的同學(xué),歡迎交流。文章來源地址http://www.zghlxwxcb.cn/news/detail-676101.html
到了這里,關(guān)于2022-kaggle-nlp賽事:Feedback Prize - English Language Learning(超多注釋講解)的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!