1.
pandas數(shù)據(jù)讀取和預(yù)處理
# import pandas and load dataset
import pandas as pd
names = ['Sex', 'Length', 'Diameter', 'Height', 'Whole_weight',
'Shucked_weight', 'Viscera_weight', 'Shell_weight', 'Rings']
data = pd.read_csv(data_file, header=None, names=names)
print(data) # [4177 rows x 9 columns]
type(data) # pandas.core.frame.DataFrame
data.isnull().values.any() # False (check if there are any missing values)
data.isnull().sum() # total no. of missing values in each column
data.isnull().sum().sum() # total no. of missing values in entire dataframe
data.dtypes
data["Rings"] = data["Rings"].astype(float) # convert from int64 to float64
data["Sex"] = data["Sex"].astype("category") # convert from object to category
data["Sex"]
data.dtypes
data.describe() # summary of data
data["Height"].describe() # summary of variable "Height" only
data["Sex"].value_counts() # summary of variable "Sex"
torch變量size
import torch
X = torch.arange(24).reshape(2, 3, 4)
len(X)
#output: 2
len (X)總是返回第0軸的長度。
X.sum(axis=0).shape # torch.Size([3, 4])
X.sum(axis=1).shape # torch.Size([2, 4])
X.sum(axis=2).shape # torch.Size([2, 3])
梯度計算

import torch
x = torch.tensor([1.0, 2.0], requires_grad=True)
y = torch.norm(x) # y is a fn of x
y # tensor(2.2361), torch.sqrt(torch.tensor(5.0))
#使用backward方法對y進行求導(dǎo),即計算y相對于x的梯度
y.backward() # take gradient of y w.r.t. x by backward method
x.grad # tensor([0.4472, 0.8944])
#檢查計算得到的梯度是否與手動計算的梯度相等,結(jié)果應(yīng)為tensor([True, True])
x.grad == x/torch.norm(x) # tensor([True, True])
x = torch.tensor([0.0, 0.0], requires_grad=True)
y = torch.norm(x)
y.backward() #對y進行求導(dǎo)。
x.grad
#應(yīng)為tensor([nan, nan]),因為在零向量上無法計算標準化。
#實際輸出:tensor([0., 0.])
x = torch.tensor(0.0, requires_grad=True)
y = torch.abs(x)
y.backward()
x.grad
#輸出:tensor(0.)
因此,梯度是x的單位向量。在x = 0處的梯度在數(shù)學(xué)上是未定義的,但是自動微分返回零。要小心,在這種情況下可能會出現(xiàn)差異。
示例的數(shù)量不能除以批處理大???
import random
import torch
from d2l import torch as d2l
data = d2l.SyntheticRegressionData(w=torch.tensor([2, -3.4]), b=4.2)
data.num_train # 1000
len(data.train_dataloader()) # 32, 1000 / 32 = 31.25
X, y = next(iter(data.train_dataloader()))
X.shape # torch.Size([32, 2])
y.shape # torch.Size([32, 1])
for i, batch in enumerate(data.train_dataloader()):
print(i, len(batch[0]))
# first 31 batches contain 32 examples each, last batch contain only 8
# https://pytorch.org/docs/stable/data.html
@d2l.add_to_class(d2l.DataModule) #@save
def get_tensorloader(self, tensors, train, indices=slice(0, None)):
tensors = tuple(a[indices] for a in tensors)
dataset = torch.utils.data.TensorDataset(*tensors)
return torch.utils.data.DataLoader(dataset, self.batch_size, shuffle=train,
drop_last=True)
# drop_last (bool, optional) – set to True to drop last incomplete batch
# if dataset size is not divisible by batch size. If False and dataset size is
# not divisible by batch size, then last batch will be smaller. (default: False)
len(data.train_dataloader()) # 31, 1000 // 32 = 31
for i, batch in enumerate(data.train_dataloader()):
print(i, len(batch[0]))
# only 31 batches containing 32 examples each, last batch with 8 is dropped
2.
不同loss函數(shù)下的線性回歸實現(xiàn)
import random
import torch
from torch import nn
from d2l import torch as d2l
#數(shù)據(jù)
data = d2l.SyntheticRegressionData(w=torch.tensor([2, -3.4]), b=4.2)
# MSE loss (Section 3.5.2 "Defining the Loss Function" of textbook)
@d2l.add_to_class(d2l.LinearRegression) #@save
#使用了@d2l.add_to_class裝飾器來將loss方法添加到LinearRegression類中。
#然后,在loss方法中,使用nn.MSELoss來計算預(yù)測值y_hat和真實值y之間的均方誤差損失。
def loss(self, y_hat, y):
fn = nn.MSELoss()
return fn(y_hat, y)
model = d2l.LinearRegression(lr=0.03)
trainer = d2l.Trainer(max_epochs=5)
trainer.fit(model, data)
w, b = model.get_w_b()
print(f'error in estimating w: {data.w - w.reshape(data.w.shape)}')
print(f'error in estimating b: {data.b - b}')
# Change loss fn to L1Loss ( https://pytorch.org/docs/stable/nn.html#loss-functions )
@d2l.add_to_class(d2l.LinearRegression) #@save
def loss(self, y_hat, y):
fn = nn.L1Loss()
return fn(y_hat, y)
model2 = d2l.LinearRegression(lr=0.03)
trainer = d2l.Trainer(max_epochs=5)
trainer.fit(model2, data)
w, b = model2.get_w_b()
print(f'error in estimating w: {data.w - w.reshape(data.w.shape)}')
print(f'error in estimating b: {data.b - b}')
MSE損失(通過取目標和輸出之間的差值的平方)對離群值更敏感,而L1損失只考慮差值的絕對大小,并且對離群值更有彈性。
Huber的損失結(jié)合了MSE和L1損失函數(shù)的最佳特性。當目標和輸出之間的差值較小時,它減少到MSE損失,但當差值較大時,它等于L1損失。這樣,當遠離收斂時,它對L1損失等異常值具有魯棒性,但在接近收斂時,它更穩(wěn)定,并像MSE損失一樣平滑收斂。它在所有點上也都是可微的。δ參數(shù)還允許用戶控制損失函數(shù)對誤差大小的敏感性。
交叉熵
參數(shù)化的ReLU
pReLU比ReLU更靈活,因為它有一個額外的參數(shù)α,可以與模型中的其他參數(shù)一起進行訓(xùn)練。當對于x < 0的ReLU函數(shù)消失時,對于x < 0的pReLU是非零的,它解決了負輸入的“垂死”ReLU問題。它可能比ReLU更有效地解決消失的梯度。文章來源:http://www.zghlxwxcb.cn/news/detail-832894.html
3.
MLP的等價解
MLP中變量的依賴性
dropout
dropout和重量衰減可以同時應(yīng)用,以減少過擬合。權(quán)值衰減限制了權(quán)值的大小,而dropout則通過防止對特定節(jié)點的過度依賴而提高了泛化。與只使用權(quán)重衰減時相比,當dropout與權(quán)重衰減一起使用時,驗證損失和精度的曲線更平滑,波動更小。文章來源地址http://www.zghlxwxcb.cn/news/detail-832894.html
4.
到了這里,關(guān)于deep learning 代碼筆記的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!