前言
? ? ? BN(Batch Normalization)主要目的是為了解決訓(xùn)練深層神經(jīng)網(wǎng)絡(luò)慢的問題。我們可以神經(jīng)網(wǎng)絡(luò)整體可以看成一個(gè)高階的復(fù)雜函數(shù),通過訓(xùn)練優(yōu)化它的參數(shù),可以用于擬合各種復(fù)雜的數(shù)據(jù)分布。一般而言,一個(gè)網(wǎng)絡(luò)會(huì)有多層,其中的每一層都可以看成一個(gè)子函數(shù),用于擬合其各自的子分布。由于下一層的輸入來自上一層的輸出,而隨著訓(xùn)練過程中上一層網(wǎng)絡(luò)參數(shù)的改動(dòng),它的輸出也將發(fā)生變化,那么將會(huì)導(dǎo)致下一層學(xué)習(xí)更加慢,同時(shí)也會(huì)使得模型的訓(xùn)練更加難,這個(gè)現(xiàn)象在原文中被稱為“internal covariate shif”現(xiàn)象,針對(duì)這一問題,作者提出了將上一層的輸出進(jìn)行標(biāo)準(zhǔn)化以調(diào)整下一層輸入的分布,以減弱“internal covariate shif”的影響,BN一般用在卷積層之后,激活層之前。更多的細(xì)節(jié)可以參考原文Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift。
? ? ? 由于BN涉及到跨樣本間的計(jì)算,因而難免會(huì)受到batch size大小的影響;另一方面,在序列樣本中(如文本、語音等),一般會(huì)進(jìn)行padding對(duì)齊每條樣本的長(zhǎng)度,以便批量輸入到模型中,此時(shí)在跨樣本計(jì)算統(tǒng)計(jì)值的話,會(huì)引入噪聲;為了解決以上兩個(gè)問題,LN(Layer Normalization)就應(yīng)運(yùn)而生,單獨(dú)對(duì)每條樣本的特征進(jìn)行正則化,詳情可參考原文Layer Normalization。
? ? ? 類似的還有Instance Normalization、Group Normalization、Switchable Normalization,這里就不多贅述了。
以下部分將會(huì)省略很多細(xì)節(jié),直接進(jìn)入代碼實(shí)現(xiàn)部分。
批量正則化-BN
? ? ? BN是同時(shí)對(duì)所有樣本的對(duì)應(yīng)通道進(jìn)行正則化,有多少個(gè)通道,就會(huì)有多少組均值和方差,單獨(dú)對(duì)每個(gè)通道的值進(jìn)行縮放,示意圖如下:
假設(shè)數(shù)據(jù)格式為 V [ B , C , H , W ] V_[B,C,H,W_] V[?B,C,H,W]?,分別表示 b a t c h _ s i z e batch\_size batch_size、通道數(shù)、高度、寬度,那么針對(duì)通過維度的任一個(gè)坐標(biāo) c ∈ [ 0 , C ) c\in [0, C) c∈[0,C),都會(huì)求出一組均值和方差
u c = 1 B ? H ? W ∑ b ∈ [ 0 , B ) h ∈ [ 0 , H ) w ∈ [ 0 , W ) V [ b , h , w ] u_c = \frac{1} {B*H*W} \sum_{b\in[0,B) h\in [0,H) w\in[0,W)} V_[b,h,w] uc?=B?H?W1?∑b∈[0,B)h∈[0,H)w∈[0,W)?V[?b,h,w]
θ c 2 = 1 B ? H ? W ∑ b ∈ [ 0 , B ) h ∈ [ 0 , H ) w ∈ [ 0 , W ) ( V [ b , h , w ] ? u c ) 2 \theta^2_c = \frac{1} {B*H*W} \sum_{b\in[0,B) h\in [0,H) w\in[0,W)}(V_[b,h,w] - u_c)^2 θc2?=B?H?W1?∑b∈[0,B)h∈[0,H)w∈[0,W)?(V[?b,h,w]?uc?)2
接下來對(duì)該通道坐標(biāo)的每一個(gè)值,進(jìn)行縮放操作
V ^ [ : , c , : , : ] ← γ c . V [ : , c , : , : ] ? u c θ c 2 + ? + β c \hat{V}_{[:, c, :, :]} \leftarrow \gamma_c . \frac{V_{[:,c,:,:]} - u_c } {\sqrt{\theta^2_c + \epsilon}} + \beta_c V^[:,c,:,:]?←γc?.θc2?+??V[:,c,:,:]??uc??+βc?
其中 ? \epsilon ?為一個(gè)非常小的常數(shù),防止分母為0, γ c , β c \gamma_c,\beta_c γc?,βc?為該通道對(duì)應(yīng)的參數(shù)。
import torch
from torch.nn import BatchNorm1d, BatchNorm2d, BatchNorm3d
import numpy as np
np.random.seed(0)
class myBatchNorm(torch.nn.Module):
def __init__(self,
size,
dim,
eps: float = 1e-6) -> None:
super().__init__()
self.gamma = torch.nn.Parameter(torch.ones(size))
self.beta = torch.nn.Parameter(torch.zeros(size))
self.eps = eps
self.dim = dim
def forward(self, tensor: torch.Tensor): # pylint: disable=arguments-differ
mean = tensor.mean(self.dim, keepdim=True)
std = tensor.std(self.dim, unbiased=False, keepdim=True)
return self.gamma * (tensor - mean) / (std + self.eps) + self.beta
print("-----一維BN------")
val_tensor = torch.from_numpy(np.random.randn(1, 3, 4)).float()
BN1 = BatchNorm1d(3, affine=False)(val_tensor)
print(BN1)
myBN1 = myBatchNorm((1, 3, 1), (0, 2))(val_tensor)
print(myBN1)
print("-----二維BN------")
val_tensor = torch.from_numpy(np.random.randn(1, 3, 4, 5)).float()
BN2 = BatchNorm2d(3, affine=False)(val_tensor)
print(BN2)
myBN2 = myBatchNorm((1, 3, 1, 1), (0, 2, 3))(val_tensor)
print(myBN2)
print("-----三維BN------")
val_tensor = torch.from_numpy(np.random.randn(1, 2, 3, 4, 5)).float()
BN3 = BatchNorm3d(2, affine=False)(val_tensor)
print(BN3)
myBN3 = myBatchNorm((1, 2, 1, 1, 1), (0, 2, 3, 4))(val_tensor)
print(myBN3)
輸出如下:
-----一維BN------
tensor([[[ 0.5905, -1.3359, -0.5187, 1.2640],
[ 1.3397, -1.2973, 0.4893, -0.5317],
[-0.9773, -0.1110, -0.5604, 1.6487]]])
tensor([[[ 0.5905, -1.3359, -0.5187, 1.2640],
[ 1.3397, -1.2973, 0.4893, -0.5317],
[-0.9773, -0.1110, -0.5604, 1.6488]]], grad_fn=<AddBackward0>)
-----二維BN------
tensor([[[[ 0.4834, -0.1121, 0.1880, 0.0854, 1.1662],
[-0.4165, 0.0662, -1.0209, -2.6032, 0.3834],
[ 0.5797, -0.9166, 1.8886, -1.5800, -0.1828],
[-0.3997, 1.2022, 1.1431, -0.0811, 0.1268]],
[[-0.5048, -1.5601, 0.0165, 0.5033, 1.5403],
[ 1.5133, -0.0216, 0.0605, -0.6600, -1.0187],
[-1.2951, 2.2359, -0.1397, -0.0706, -0.8572],
[ 1.1031, -1.2059, 0.1470, -0.5122, 0.7259]],
[[-0.2520, -1.2639, 0.4771, 1.1667, 0.6201],
[ 0.9766, -0.4386, -0.0283, -0.4962, -0.0235],
[-0.7087, -2.0882, 0.7877, -0.0873, -1.9430],
[ 1.2188, -0.8510, 0.5981, 1.6211, 0.7145]]]])
tensor([[[[ 0.4834, -0.1121, 0.1880, 0.0854, 1.1662],
[-0.4165, 0.0662, -1.0209, -2.6032, 0.3834],
[ 0.5797, -0.9166, 1.8886, -1.5800, -0.1828],
[-0.3997, 1.2022, 1.1431, -0.0811, 0.1268]],
[[-0.5048, -1.5601, 0.0165, 0.5033, 1.5403],
[ 1.5133, -0.0216, 0.0605, -0.6600, -1.0187],
[-1.2951, 2.2359, -0.1397, -0.0706, -0.8572],
[ 1.1031, -1.2059, 0.1470, -0.5122, 0.7260]],
[[-0.2520, -1.2639, 0.4771, 1.1667, 0.6201],
[ 0.9766, -0.4386, -0.0283, -0.4962, -0.0235],
[-0.7087, -2.0882, 0.7877, -0.0873, -1.9430],
[ 1.2188, -0.8510, 0.5981, 1.6211, 0.7145]]]],
grad_fn=<AddBackward0>)
-----三維BN------
tensor([[[[[ 0.8306, -1.5469, 0.0926, -0.9961, -1.1823],
[-0.8900, -0.6223, -0.2541, -1.4771, 0.5917],
[ 0.1560, -1.8487, 1.1800, 1.5882, 0.8701],
[-0.4905, -1.3826, 0.7456, -0.7141, 0.9138]],
[[-0.1018, 0.6676, 0.0465, 0.3972, -0.2998],
[ 1.4780, -0.1832, 0.0922, 1.5754, -1.6599],
[-1.5826, 0.6604, -1.4851, 1.6360, -0.7245],
[-1.0588, 1.6152, 1.1722, 1.5598, 0.5970]],
[[-1.1727, 1.6023, -0.5787, 0.4932, 0.6382],
[-0.4656, 0.3046, 0.6131, 0.0666, -1.4112],
[-0.0117, 1.0179, -1.0059, -0.4602, -0.7461],
[ 1.5415, 0.3629, 0.0977, -1.0813, 0.2297]]],
[[[-0.5496, 0.1743, -0.5101, 0.8350, 0.7327],
[-0.0719, 0.5476, -0.9788, -1.3869, 0.5920],
[ 0.3125, 0.7926, 2.5845, 1.1098, -0.7940],
[ 1.2866, -1.2072, -0.3315, 0.0717, 1.8979]],
[[-0.6218, -0.7055, 0.0407, -0.5384, 1.2965],
[-0.9653, -1.0345, -0.3071, -0.3689, 2.1195],
[ 1.1148, 0.2314, -1.1145, 1.0072, -0.8836],
[-1.4418, 1.3594, 0.4665, 1.0856, 0.4684]],
[[ 1.0199, -0.5257, -0.9185, 0.8403, -0.6819],
[-0.5652, -0.3253, 0.1596, -0.2212, -1.2677],
[-0.5181, -2.1374, 0.7825, -1.5005, -0.9904],
[ 0.1951, -0.6164, 1.7233, -1.1836, 0.4154]]]]])
tensor([[[[[ 0.8306, -1.5469, 0.0926, -0.9961, -1.1823],
[-0.8900, -0.6223, -0.2541, -1.4771, 0.5917],
[ 0.1560, -1.8487, 1.1800, 1.5882, 0.8701],
[-0.4905, -1.3826, 0.7456, -0.7141, 0.9138]],
[[-0.1018, 0.6676, 0.0465, 0.3972, -0.2998],
[ 1.4780, -0.1832, 0.0922, 1.5754, -1.6599],
[-1.5826, 0.6604, -1.4851, 1.6360, -0.7245],
[-1.0588, 1.6153, 1.1722, 1.5598, 0.5970]],
[[-1.1727, 1.6024, -0.5787, 0.4932, 0.6382],
[-0.4656, 0.3046, 0.6131, 0.0666, -1.4112],
[-0.0117, 1.0179, -1.0059, -0.4602, -0.7461],
[ 1.5415, 0.3629, 0.0977, -1.0813, 0.2297]]],
[[[-0.5496, 0.1743, -0.5101, 0.8350, 0.7327],
[-0.0719, 0.5476, -0.9788, -1.3870, 0.5920],
[ 0.3125, 0.7926, 2.5845, 1.1098, -0.7940],
[ 1.2866, -1.2072, -0.3315, 0.0717, 1.8979]],
[[-0.6218, -0.7055, 0.0407, -0.5384, 1.2965],
[-0.9653, -1.0346, -0.3071, -0.3689, 2.1195],
[ 1.1148, 0.2314, -1.1145, 1.0072, -0.8836],
[-1.4418, 1.3594, 0.4665, 1.0856, 0.4684]],
[[ 1.0199, -0.5257, -0.9185, 0.8403, -0.6819],
[-0.5652, -0.3253, 0.1596, -0.2212, -1.2677],
[-0.5181, -2.1374, 0.7825, -1.5005, -0.9904],
[ 0.1951, -0.6164, 1.7233, -1.1836, 0.4154]]]]],
grad_fn=<AddBackward0>)
層正則化-LN
? ? ? LN是逐個(gè)樣本進(jìn)行正則化,假設(shè)數(shù)據(jù)格式為
V
[
B
,
T
,
C
]
V_[B,T,C_]
V[?B,T,C]?,分別表示
b
a
t
c
h
_
s
i
z
e
batch\_size
batch_size、序列長(zhǎng)度、特征維度,那么一般會(huì)針對(duì)每個(gè)樣本的每個(gè)序列點(diǎn)的特征進(jìn)行標(biāo)準(zhǔn)化,當(dāng)然也可以對(duì)每個(gè)樣本整體進(jìn)行標(biāo)準(zhǔn)化,示意圖如下:
下面以對(duì)每個(gè)樣本的每個(gè)序列點(diǎn)進(jìn)行標(biāo)準(zhǔn)化為例進(jìn)行公示的演示。
u b , t = 1 C ∑ c ∈ [ 0 , C ) V [ b , t , c ] u_{b,t} = \frac{1} {C} \sum_{c\in[0,C)} V_[b,t,c] ub,t?=C1?∑c∈[0,C)?V[?b,t,c]
θ b , t 2 = 1 C ∑ c ∈ [ 0 , C ) ( V [ b , t , c ] ? u b , t ) 2 \theta^2_{b,t} = \frac{1} {C} \sum_{c\in[0,C)}(V_[b,t,c] - u_{b,t})^2 θb,t2?=C1?∑c∈[0,C)?(V[?b,t,c]?ub,t?)2
接下來,進(jìn)行對(duì)應(yīng)的縮放操作
V
^
[
b
,
t
,
:
]
←
γ
t
.
V
[
b
,
t
,
:
]
?
u
b
,
t
θ
b
,
t
2
+
?
+
β
t
\hat{V}_{[b, t, :]} \leftarrow \gamma_t . \frac{V_{[b,t,:]} - u_{b,t} } {\sqrt{\theta^2_{b,t} + \epsilon}} + \beta_t
V^[b,t,:]?←γt?.θb,t2?+??V[b,t,:]??ub,t??+βt?
其中
?
\epsilon
?為一個(gè)非常小的常數(shù),防止分母為0,注意
γ
t
,
β
t
\gamma_t,\beta_t
γt?,βt?為所有樣本共享。文章來源:http://www.zghlxwxcb.cn/news/detail-792767.html
import torch
from torch.nn import LayerNorm as LN
import numpy as np
np.random.seed(0)
val = np.random.randn(2, 3, 4)
val_tensor = torch.from_numpy(val).float()
class myLayerNorm(torch.nn.Module):
def __init__(self,
size,
dim,
eps: float = 1e-6) -> None:
super().__init__()
self.gamma = torch.nn.Parameter(torch.ones(size))
self.beta = torch.nn.Parameter(torch.zeros(size))
self.eps = eps
self.dim = dim
def forward(self, tensor: torch.Tensor): # pylint: disable=arguments-differ
mean = tensor.mean(self.dim, keepdim=True)
std = tensor.std(self.dim, unbiased=False, keepdim=True)
return self.gamma * (tensor - mean) / (std + self.eps) + self.beta
print("對(duì)整個(gè)樣本進(jìn)行正則化")
LN2 = LN([3, 4])(val_tensor)
myLN2 = myLayerNorm((3, 4), (1, 2))(val_tensor)
print("torchNL")
print(LV2)
print("myML")
print(myLN2)
print("對(duì)每個(gè)樣本的每個(gè)序列點(diǎn)進(jìn)行正則化")
LN1 = LN(4)(val_tensor)
print(LN1)
myLN2 = myLayerNorm((4), 2)(val_tensor)
print(myLN2)
輸出如下:文章來源地址http://www.zghlxwxcb.cn/news/detail-792767.html
對(duì)整個(gè)樣本進(jìn)行正則化
torchNL
tensor([[[ 1.1009, -0.3772, 0.2498, 1.6177],
[ 1.2131, -1.8700, 0.2188, -0.9749],
[-0.9227, -0.3659, -0.6548, 0.7652]],
[[ 0.7022, 0.0685, 0.3878, 0.2786],
[ 1.4288, -0.2555, 0.2582, -0.8987],
[-2.5826, 0.5957, 0.8047, -0.7878]]],
grad_fn=<NativeLayerNormBackward0>)
myML
tensor([[[ 1.1009, -0.3772, 0.2498, 1.6177],
[ 1.2131, -1.8700, 0.2188, -0.9749],
[-0.9227, -0.3659, -0.6548, 0.7652]],
[[ 0.7022, 0.0685, 0.3878, 0.2786],
[ 1.4288, -0.2555, 0.2582, -0.8987],
[-2.5826, 0.5957, 0.8047, -0.7878]]], grad_fn=<AddBackward0>)
對(duì)每個(gè)樣本的每個(gè)序列點(diǎn)進(jìn)行正則化
tensor([[[ 0.5905, -1.3359, -0.5187, 1.2640],
[ 1.3397, -1.2973, 0.4893, -0.5317],
[-0.9773, -0.1110, -0.5604, 1.6487]],
[[ 1.4983, -1.2706, 0.1247, -0.3525],
[ 1.5190, -0.4557, 0.1465, -1.2098],
[-1.5448, 0.8043, 0.9587, -0.2182]]],
grad_fn=<NativeLayerNormBackward0>)
tensor([[[ 0.5905, -1.3359, -0.5187, 1.2640],
[ 1.3397, -1.2973, 0.4893, -0.5317],
[-0.9773, -0.1110, -0.5604, 1.6488]],
[[ 1.4985, -1.2707, 0.1247, -0.3525],
[ 1.5190, -0.4557, 0.1465, -1.2098],
[-1.5448, 0.8043, 0.9587, -0.2182]]], grad_fn=<AddBackward0>)
到了這里,關(guān)于Batch Normalization、Layer Normalization代碼實(shí)現(xiàn)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!