ResNet網(wǎng)絡(luò)結(jié)構(gòu)詳解，網(wǎng)絡(luò)搭建，遷移學(xué)習(xí)

這篇具有很好參考價(jià)值的文章主要介紹了ResNet網(wǎng)絡(luò)結(jié)構(gòu)詳解，網(wǎng)絡(luò)搭建，遷移學(xué)習(xí)。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

前言：

參考內(nèi)容來自u(píng)p：6.1 ResNet網(wǎng)絡(luò)結(jié)構(gòu)，BN以及遷移學(xué)習(xí)詳解_嗶哩嗶哩_bilibili

up的代碼和ppt：https://github.com/WZMIAOMIAO/deep-learning-for-image-processing

一、簡介

ResNet 網(wǎng)絡(luò)是在 2015年由微軟實(shí)驗(yàn)室提出，斬獲當(dāng)年ImageNet競(jìng)賽中分類任務(wù)第一名，目標(biāo)檢測(cè)第一名。獲得COCO數(shù)據(jù)集中目標(biāo)檢測(cè)第一名，圖像分割第一名。

原論文地址：[1512.03385] Deep Residual Learning for Image Recognition (arxiv.org)

在ResNet網(wǎng)絡(luò)的創(chuàng)新點(diǎn)：

搭建超深的網(wǎng)絡(luò)結(jié)構(gòu)（可突破1000層）

提出 Residual 結(jié)構(gòu)（殘差結(jié)構(gòu) )

使用 Batch Normalization 加速訓(xùn)練（丟棄dropout）

ResNet網(wǎng)絡(luò)結(jié)構(gòu)詳解，網(wǎng)絡(luò)搭建，遷移學(xué)習(xí)

下圖是ResNet34層模型和VGG模型的對(duì)比結(jié)構(gòu)簡圖：

二、詳解

1 . 為什么使用殘差模塊？

在ResNet網(wǎng)絡(luò)提出之前，傳統(tǒng)的卷積神經(jīng)網(wǎng)絡(luò)都是通過將一系列卷積層與池化層進(jìn)行堆疊得到的。

一般我們會(huì)覺得網(wǎng)絡(luò)越深，特征信息越豐富，模型效果應(yīng)該越好。但是實(shí)驗(yàn)證明，當(dāng)網(wǎng)絡(luò)堆疊到一定深度時(shí)，會(huì)出現(xiàn)兩個(gè)問題：

梯度消失或梯度爆炸

關(guān)于梯度消失和梯度爆炸，其實(shí)看名字理解最好：
若每一層的誤差梯度小于1，反向傳播時(shí)，網(wǎng)絡(luò)越深，梯度越趨近于0
反之，若每一層的誤差梯度大于1，反向傳播時(shí)，網(wǎng)路越深，梯度越來越大

這個(gè)問題通常通過對(duì)數(shù)據(jù)進(jìn)行標(biāo)準(zhǔn)化處理，權(quán)重初始化，BN處理等方法。

退化問題(degradation problem)：在解決了梯度消失、爆炸問題后，仍然存在深層網(wǎng)絡(luò)的效果可能比淺層網(wǎng)絡(luò)差的現(xiàn)象

這個(gè)問題通過殘差結(jié)構(gòu)解決

總結(jié)就是，當(dāng)網(wǎng)絡(luò)堆疊到一定深度時(shí)，反而會(huì)出現(xiàn)深層網(wǎng)絡(luò)比淺層網(wǎng)絡(luò)效果差的情況。

如下圖所示，20層網(wǎng)絡(luò)比56層網(wǎng)絡(luò)的誤差更?。?/p>

在原論文中：

提出通過數(shù)據(jù)的預(yù)處理以及在網(wǎng)絡(luò)中使用 BN（Batch Normalization）層來解決梯度消失或梯度爆炸問題。

提出了 residual結(jié)構(gòu)（殘差結(jié)構(gòu)）來減輕退化問題，下圖是使用residual結(jié)構(gòu)的卷積網(wǎng)絡(luò)，可以看到隨著網(wǎng)絡(luò)的不斷加深，效果并沒有變差，而是變的更好了。（虛線是train error，實(shí)線是test error）

2 . 什么是殘差網(wǎng)絡(luò) ？

為了解決深層網(wǎng)絡(luò)中的退化問題，可以人為地讓神經(jīng)網(wǎng)絡(luò)某些層跳過下一層神經(jīng)元的連接，隔層相連，弱化每層之間的強(qiáng)聯(lián)系。這種神經(jīng)網(wǎng)絡(luò)被稱為殘差網(wǎng)絡(luò) (ResNets)。

殘差塊

假設(shè) F(x) 代表某個(gè)只包含有兩層的映射函數(shù)， x 是輸入， F(x)是輸出。假設(shè)他們具有相同的維度。在訓(xùn)練的過程中我們希望能夠通過修改網(wǎng)絡(luò)中的 w和b去擬合一個(gè)理想的 H(x)(從輸入到輸出的一個(gè)理想的映射函數(shù))。也就是我們的目標(biāo)是修改F(x) 中的 w和b逼近 H(x) 。如果我們改變思路，用F(x) 來逼近 H(x)-x ，那么我們最終得到的輸出就變?yōu)?F(x)+x（這里的加指的是對(duì)應(yīng)位置上的元素相加，也就是element-wise addition），這里將直接從輸入連接到輸出的結(jié)構(gòu)也稱為shortcut，那整個(gè)結(jié)構(gòu)就是殘差塊，ResNet的基礎(chǔ)模塊。

ResNet沿用了VGG全3x3卷積層的設(shè)計(jì)。殘差塊里首先有2個(gè)有相同輸出通道數(shù)的3x3卷積層。每個(gè)卷積層后接BN層和ReLU激活函數(shù)，然后將輸入直接加在最后的ReLU激活函數(shù)前，這種結(jié)構(gòu)用于層數(shù)較少的神經(jīng)網(wǎng)絡(luò)中，比如ResNet34。若輸入通道數(shù)比較多，就需要引入1x1卷積層來調(diào)整輸入的通道數(shù)，這種結(jié)構(gòu)也叫作瓶頸模塊，通常用于網(wǎng)絡(luò)層數(shù)較多的結(jié)構(gòu)中。如下圖所示：

注意：主分支與shortcut的輸出特征矩陣shape必須相同

可以計(jì)算一下，假設(shè)兩個(gè)殘差結(jié)構(gòu)的輸入特征和輸出特征矩陣的深度都是256維，如下圖：（注意左側(cè)結(jié)構(gòu)的改動(dòng)）

那么兩個(gè)殘差結(jié)構(gòu)所需的參數(shù)為：
左側(cè)：3 × 3 × 256 × 256 + 3 × 3 × 256 × 256 = 1 , 179 , 648
右側(cè)：1 × 1 × 256 × 64 + 3 × 3 × 64 × 64 + 1 × 1 × 64 × 256 = 69 , 632
注：CNN參數(shù)個(gè)數(shù) = 卷積核尺寸×卷積核深度 × 卷積核組數(shù) = 卷積核尺寸 × 輸入特征矩陣深度 × 輸出特征矩陣深度
明顯搭建深層網(wǎng)絡(luò)時(shí)，使用右側(cè)的殘差結(jié)構(gòu)更合適。

3 . 網(wǎng)路結(jié)構(gòu)

觀察下圖34層網(wǎng)絡(luò)，可以發(fā)現(xiàn)有些殘差塊的 short cut 是實(shí)線的，而有些則是虛線的。

這些虛線的 short cut 上通過1×1的卷積核進(jìn)行了維度處理（特征矩陣在長寬方向降采樣，深度方向調(diào)整成下一層殘差結(jié)構(gòu)所需要的channel）。

下圖是原論文給出的不同深度的ResNet網(wǎng)絡(luò)結(jié)構(gòu)配置，注意表中的殘差結(jié)構(gòu)給出了主分支上卷積核的大小與卷積核個(gè)數(shù)，表中殘差塊×N 表示將該殘差結(jié)構(gòu)重復(fù)N次。

原文的表注中已說明，conv3_x, conv4_x, conv5_x所對(duì)應(yīng)的一系列殘差結(jié)構(gòu)的第一層殘差結(jié)構(gòu)都是虛線殘差結(jié)構(gòu)。因?yàn)檫@一系列殘差結(jié)構(gòu)的第一層都有調(diào)整輸入特征矩陣shape的作用（將特征矩陣的高和寬縮減為原來的一半，將深度channel調(diào)整成下一層殘差結(jié)構(gòu)所需要的channel）

需要注意的是，對(duì)于ResNet50/101/152，其實(shí)conv2_x所對(duì)應(yīng)的一系列殘差結(jié)構(gòu)的第一層也是虛線殘差結(jié)構(gòu)，因?yàn)樗枰{(diào)整輸入特征矩陣的channel。根據(jù)表格可知通過3x3的max pool之后輸出的特征矩陣shape應(yīng)該是[56, 56, 64]，但conv2_x所對(duì)應(yīng)的一系列殘差結(jié)構(gòu)中的實(shí)線殘差結(jié)構(gòu)它們期望的輸入特征矩陣shape是[56, 56, 256]（因?yàn)檫@樣才能保證輸入輸出特征矩陣shape相同，才能將捷徑分支的輸出與主分支的輸出進(jìn)行相加）。所以第一層殘差結(jié)構(gòu)需要將shape從[56, 56, 64] --> [56, 56, 256]。注意，這里只調(diào)整channel維度，高和寬不變（而conv3_x, conv4_x, conv5_x所對(duì)應(yīng)的一系列殘差結(jié)構(gòu)的第一層虛線殘差結(jié)構(gòu)不僅要調(diào)整channel還要將高和寬縮減為原來的一半）。

	ResNet	殘差結(jié)構(gòu)
淺層網(wǎng)絡(luò)	ResNet18/34	BasicBlock
深層網(wǎng)絡(luò)	ResNet50/101/152	Bottleneck

ResNet 18/34 具體的殘差結(jié)構(gòu)圖：

ResNet 50/101/152 具體的殘差結(jié)構(gòu)圖：

4 . Batch Normalization原理

我們?cè)趫D像預(yù)處理過程中通常會(huì)對(duì)圖像進(jìn)行標(biāo)準(zhǔn)化處理，這樣能夠加速網(wǎng)絡(luò)的收斂，如下圖所示，對(duì)于Conv1來說輸入的就是滿足某一分布的特征矩陣，但對(duì)于Conv2而言輸入的feature map就不一定滿足某一分布規(guī)律了（注意這里所說滿足某一分布規(guī)律并不是指某一個(gè)feature map的數(shù)據(jù)要滿足分布規(guī)律，理論上是指整個(gè)訓(xùn)練樣本集所對(duì)應(yīng)feature map的數(shù)據(jù)要滿足分布規(guī)律）。而我們Batch Normalization的目的就是使我們的feature map滿足均值為0，方差為1的分布規(guī)律。

使用BN時(shí)需要注意的問題

（1）訓(xùn)練時(shí)要將traning參數(shù)設(shè)置為True，在驗(yàn)證時(shí)將trainning參數(shù)設(shè)置為False。在pytorch中可通過創(chuàng)建模型的model.train()和model.eval()方法控制。

（2）batch size盡可能設(shè)置大點(diǎn)，設(shè)置小后表現(xiàn)可能很糟糕，設(shè)置的越大求的均值和方差越接近整個(gè)訓(xùn)練集的均值和方差。

（3）建議將bn層放在卷積層（Conv）和激活層（例如Relu）之間，且卷積層不要使用偏置bias，因?yàn)闆]有用

三、遷移學(xué)習(xí)

遷移學(xué)習(xí)（Transfer Learning）是一種機(jī)器學(xué)習(xí)方法，是把一個(gè)領(lǐng)域（源領(lǐng)域）的知識(shí)，遷移到另外一個(gè)領(lǐng)域（目標(biāo)領(lǐng)域），使得目標(biāo)領(lǐng)域能夠取得更好的學(xué)習(xí)效果。

在遷移學(xué)習(xí)中，我們希望利用源任務(wù)（Source Task）學(xué)到的知識(shí)幫助學(xué)習(xí)目標(biāo)任務(wù) (Target Task)。例如，一個(gè)訓(xùn)練好的圖像分類網(wǎng)絡(luò)能夠被用于另一個(gè)圖像相關(guān)的任務(wù)。再比如，一個(gè)網(wǎng)絡(luò)在仿真環(huán)境學(xué)習(xí)的知識(shí)可以被遷移到真實(shí)環(huán)境的網(wǎng)絡(luò)。遷移學(xué)習(xí)一個(gè)典型的例子就是載入訓(xùn)練好VGG網(wǎng)絡(luò)，這個(gè)大規(guī)模分類網(wǎng)絡(luò)能將圖像分到1000個(gè)類別，然后把這個(gè)網(wǎng)絡(luò)用于另一個(gè)任務(wù)，如醫(yī)學(xué)圖像分類。

如下圖所示，神經(jīng)網(wǎng)絡(luò)逐層提取圖像的深層信息，這樣，預(yù)訓(xùn)練網(wǎng)絡(luò)就相當(dāng)于一個(gè)特征提取器。

使用遷移學(xué)習(xí)的優(yōu)勢(shì)：

能夠快速的訓(xùn)練出一個(gè)理想的結(jié)果

當(dāng)數(shù)據(jù)集較小時(shí)也能訓(xùn)練出理想的效果

?? 注意：使用別人預(yù)訓(xùn)練好的模型參數(shù)時(shí)，要注意別人的預(yù)處理方式。

常見的遷移學(xué)習(xí)方式：

載入權(quán)重后訓(xùn)練所有參數(shù)

載入權(quán)重后只訓(xùn)練最后幾層參數(shù)

載入權(quán)重后在原網(wǎng)絡(luò)基礎(chǔ)上再添加一層全連接層，僅訓(xùn)練最后一個(gè)全連接層

四、網(wǎng)絡(luò)搭建

1 model.py

import torch.nn as nn
import torch

# ResNet18/34的殘差結(jié)構(gòu)，用的是2個(gè)3x3的卷積
class BasicBlock(nn.Module):
    expansion = 1  # 殘差結(jié)構(gòu)中，主分支的卷積核個(gè)數(shù)是否發(fā)生變化，不變則為1

    #初始化層結(jié)構(gòu)，downsample使得既有實(shí)現(xiàn)的功能又有虛線的功能，conv3_x, conv4_x, conv5_x所對(duì)應(yīng)的一系列殘差結(jié)構(gòu)的第一層都有調(diào)整輸入特征矩陣shape的作用
    def __init__(self, in_channel, out_channel, stride=1, downsample=None, **kwargs):  # downsample對(duì)應(yīng)虛線殘差結(jié)構(gòu)
        super(BasicBlock, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=out_channel,
                               kernel_size=3, stride=stride, padding=1, bias=False)     #stride為傳入進(jìn)來的，1為視線，不需要改變大小，2為虛線結(jié)構(gòu)，使用BN時(shí)不使用偏置
                                                                                        #stride=1，output=（input-3+2*1）/ 1 + 1 = input   輸入和輸出的高和寬時(shí)一樣的
                                                                                        #stride=2，output=（input-3+2*1）/ 2 + 1 = input = input/2 + 0.5 = input/2（向下取整）
        self.bn1 = nn.BatchNorm2d(out_channel)
        self.relu = nn.ReLU()
        self.conv2 = nn.Conv2d(in_channels=out_channel, out_channels=out_channel,
                               kernel_size=3, stride=1, padding=1, bias=False)
        self.bn2 = nn.BatchNorm2d(out_channel)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None:   # 虛線殘差結(jié)構(gòu)，需要下采樣
            identity = self.downsample(x)  # 捷徑分支 short cut

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        out += identity
        out = self.relu(out)

        return out

# ResNet50/101/152的殘差結(jié)構(gòu)，用的是1x1+3x3+1x1的卷積
class Bottleneck(nn.Module):
    """
    注意：原論文中，在虛線殘差結(jié)構(gòu)的主分支上，第一個(gè)1x1卷積層的步距是2，第二個(gè)3x3卷積層步距是1。
    但在pytorch官方實(shí)現(xiàn)過程中是第一個(gè)1x1卷積層的步距是1，第二個(gè)3x3卷積層步距是2，
    這么做的好處是能夠在top1上提升大概0.5%的準(zhǔn)確率。
    可參考Resnet v1.5 https://ngc.nvidia.com/catalog/model-scripts/nvidia:resnet_50_v1_5_for_pytorch
    """
    expansion = 4  # 殘差結(jié)構(gòu)中第三層卷積核個(gè)數(shù)是第一/二層卷積核個(gè)數(shù)的4倍

    def __init__(self, in_channel, out_channel, stride=1, downsample=None,
                 groups=1, width_per_group=64):
        super(Bottleneck, self).__init__()

        width = int(out_channel * (width_per_group / 64.)) * groups

        self.conv1 = nn.Conv2d(in_channels=in_channel, out_channels=width,
                               kernel_size=1, stride=1, bias=False)  # squeeze channels
        self.bn1 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv2 = nn.Conv2d(in_channels=width, out_channels=width, groups=groups,
                               kernel_size=3, stride=stride, bias=False, padding=1)
                               #stride=stride根據(jù)傳入的進(jìn)行調(diào)整，因?yàn)閷?shí)線中的第二層是1，虛線中是2
        self.bn2 = nn.BatchNorm2d(width)
        # -----------------------------------------
        self.conv3 = nn.Conv2d(in_channels=width, out_channels=out_channel*self.expansion,      #卷積核個(gè)數(shù)變?yōu)?倍
                               kernel_size=1, stride=1, bias=False)  # unsqueeze channels
        self.bn3 = nn.BatchNorm2d(out_channel*self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample

    def forward(self, x):
        identity = x
        if self.downsample is not None: 
            identity = self.downsample(x)  # 捷徑分支 short cut

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        out += identity
        out = self.relu(out)

        return out

#整個(gè)網(wǎng)絡(luò)的框架部分
class ResNet(nn.Module):

    # block = BasicBlock or Bottleneck
    # block_num為殘差結(jié)構(gòu)中conv2_x~conv5_x中殘差塊個(gè)數(shù)，是一個(gè)列表，如34層中的是3，4，6，3
    def __init__(self,
                 block,     
                 blocks_num,
                 num_classes=1000,
                 include_top=True,      #方便再resnet網(wǎng)絡(luò)的基礎(chǔ)上搭建其他網(wǎng)絡(luò)，這里用不到
                 groups=1,
                 width_per_group=64):
        super(ResNet, self).__init__()
        self.include_top = include_top
        self.in_channel = 64

        self.groups = groups
        self.width_per_group = width_per_group

        self.conv1 = nn.Conv2d(3, self.in_channel, kernel_size=7, stride=2,
                               padding=3, bias=False)
        self.bn1 = nn.BatchNorm2d(self.in_channel)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, blocks_num[0])        #對(duì)應(yīng)conv2所有的一切殘差結(jié)構(gòu)，通過_make_layer函數(shù)生成
        self.layer2 = self._make_layer(block, 128, blocks_num[1], stride=2)
        self.layer3 = self._make_layer(block, 256, blocks_num[2], stride=2)
        self.layer4 = self._make_layer(block, 512, blocks_num[3], stride=2)
        if self.include_top:
            self.avgpool = nn.AdaptiveAvgPool2d((1, 1))  # output size = (1, 1)，自適應(yīng)平均池化下采樣
            self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode='fan_out', nonlinearity='relu')

    # channel為殘差結(jié)構(gòu)中第一層卷積核個(gè)數(shù)，block_num表示該層一共包含多少個(gè)殘差結(jié)構(gòu)，如34層中的是3，4，6，3
    def _make_layer(self, block, channel, block_num, stride=1):
        downsample = None
        # ResNet50/101/152的殘差結(jié)構(gòu)，block.expansion=4
        if stride != 1 or self.in_channel != channel * block.expansion:     #layer2，3，4都會(huì)經(jīng)過這個(gè)結(jié)構(gòu)
            downsample = nn.Sequential(     #生成下采樣函數(shù)，這里只需要調(diào)整conv2的特征矩陣的深度
                nn.Conv2d(self.in_channel, channel * block.expansion, kernel_size=1, stride=stride, bias=False),
                nn.BatchNorm2d(channel * block.expansion))

        layers = []         #空列表     #首先將第一層殘差結(jié)構(gòu)添加進(jìn)去，block = BasicBlock or Bottleneck
        layers.append(block(self.in_channel,        #輸入特征矩陣的深度64
                            channel,                #殘差結(jié)構(gòu)對(duì)應(yīng)主分支上的第一個(gè)卷積層的卷積核個(gè)數(shù)
                            downsample=downsample,  #50/101/152對(duì)應(yīng)的是高寬不變，深度4倍，對(duì)應(yīng)的虛線殘差結(jié)構(gòu)
                            stride=stride,          #對(duì)于layer1，stride=1，是高寬不變，深度4倍
                            groups=self.groups,
                            width_per_group=self.width_per_group))
        self.in_channel = channel * block.expansion

        for _ in range(1, block_num):               #通過循環(huán)將剩下的一系列實(shí)線殘差結(jié)構(gòu)壓入，從1開始，因?yàn)?層上面已經(jīng)搭建好了
            layers.append(block(self.in_channel,
                                channel,
                                groups=self.groups,
                                width_per_group=self.width_per_group))

        return nn.Sequential(*layers)       #轉(zhuǎn)換為非關(guān)鍵字參數(shù)傳入，Sequential將一系列結(jié)構(gòu)組合再一起并返回到layer1

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)

        if self.include_top:
            x = self.avgpool(x)
            x = torch.flatten(x, 1)
            x = self.fc(x)

        return x


def resnet34(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet34-333f7ec4.pth
    return ResNet(BasicBlock, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet50(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet50-19c8e357.pth
    return ResNet(Bottleneck, [3, 4, 6, 3], num_classes=num_classes, include_top=include_top)


def resnet101(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnet101-5d3b4d8f.pth
    return ResNet(Bottleneck, [3, 4, 23, 3], num_classes=num_classes, include_top=include_top)


def resnext50_32x4d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext50_32x4d-7cdf4587.pth
    groups = 32
    width_per_group = 4
    return ResNet(Bottleneck, [3, 4, 6, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)


def resnext101_32x8d(num_classes=1000, include_top=True):
    # https://download.pytorch.org/models/resnext101_32x8d-8ba56ff5.pth
    groups = 32
    width_per_group = 8
    return ResNet(Bottleneck, [3, 4, 23, 3],
                  num_classes=num_classes,
                  include_top=include_top,
                  groups=groups,
                  width_per_group=width_per_group)

由于ResNet網(wǎng)絡(luò)較深，直接訓(xùn)練的話會(huì)非常耗時(shí)，因此用遷移學(xué)習(xí)的方法導(dǎo)入預(yù)訓(xùn)練好的模型參數(shù)：

下載預(yù)訓(xùn)練的模型參數(shù)的鏈接在上方代碼中文章來源地址http://www.zghlxwxcb.cn/news/detail-439668.html

2. train.py

import os
import sys
import json

import torch
import torch.nn as nn
import torch.optim as optim
from torchvision import transforms, datasets
from tqdm import tqdm

from model import resnet34


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print("using {} device.".format(device))

    data_transform = {
        "train": transforms.Compose([transforms.RandomResizedCrop(224),
                                     transforms.RandomHorizontalFlip(),
                                     transforms.ToTensor(),
                                     transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])]),
        "val": transforms.Compose([transforms.Resize(256),      #原圖的長寬比固定不動(dòng)，把最小邊長縮放到256
                                   transforms.CenterCrop(224),      #中心裁剪
                                   transforms.ToTensor(),
                                   transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])}

    data_root = os.path.abspath(os.path.join(os.getcwd(), "../"))  # get data root path
    image_path = os.path.join(data_root, "data_set", "flower_data")  # flower data set path
    assert os.path.exists(image_path), "{} path does not exist.".format(image_path)
    train_dataset = datasets.ImageFolder(root=os.path.join(image_path, "train"),
                                         transform=data_transform["train"])
    train_num = len(train_dataset)

    # {'daisy':0, 'dandelion':1, 'roses':2, 'sunflower':3, 'tulips':4}
    flower_list = train_dataset.class_to_idx
    cla_dict = dict((val, key) for key, val in flower_list.items())
    # write dict into json file
    json_str = json.dumps(cla_dict, indent=4)
    with open('class_indices.json', 'w') as json_file:
        json_file.write(json_str)

    batch_size = 4
    nw = min([os.cpu_count(), batch_size if batch_size > 1 else 0, 8])  # number of workers
    print('Using {} dataloader workers every process'.format(nw))

    train_loader = torch.utils.data.DataLoader(train_dataset,
                                               batch_size=batch_size, shuffle=True,
                                               num_workers=nw)

    validate_dataset = datasets.ImageFolder(root=os.path.join(image_path, "val"),
                                            transform=data_transform["val"])
    val_num = len(validate_dataset)
    validate_loader = torch.utils.data.DataLoader(validate_dataset,
                                                  batch_size=batch_size, shuffle=False,
                                                  num_workers=nw)

    print("using {} images for training, {} images for validation.".format(train_num,
                                                                           val_num))
    
    net = resnet34()
    # load pretrain weights
    # download url: https://download.pytorch.org/models/resnet34-333f7ec4.pth
    model_weight_path = "./resnet34-pre.pth"
    assert os.path.exists(model_weight_path), "file {} does not exist.".format(model_weight_path)
    net.load_state_dict(torch.load(model_weight_path, map_location='cpu'))
    # for param in net.parameters():
    #     param.requires_grad = False

    # change fc layer structure
    in_channel = net.fc.in_features
    net.fc = nn.Linear(in_channel, 5)
    net.to(device)

    # define loss function
    loss_function = nn.CrossEntropyLoss()

    # construct an optimizer
    params = [p for p in net.parameters() if p.requires_grad]
    optimizer = optim.Adam(params, lr=0.0001)

    epochs = 3
    best_acc = 0.0
    save_path = './resNet34.pth'
    train_steps = len(train_loader)
    for epoch in range(epochs):
        # train
        net.train()
        running_loss = 0.0
        train_bar = tqdm(train_loader, file=sys.stdout)
        for step, data in enumerate(train_bar):
            images, labels = data
            optimizer.zero_grad()
            logits = net(images.to(device))
            loss = loss_function(logits, labels.to(device))
            loss.backward()
            optimizer.step()

            # print statistics
            running_loss += loss.item()

            train_bar.desc = "train epoch[{}/{}] loss:{:.3f}".format(epoch + 1,
                                                                     epochs,
                                                                     loss)

        # validate
        net.eval()
        acc = 0.0  # accumulate accurate number / epoch
        with torch.no_grad():
            val_bar = tqdm(validate_loader, file=sys.stdout)
            for val_data in val_bar:
                val_images, val_labels = val_data
                outputs = net(val_images.to(device))
                # loss = loss_function(outputs, test_labels)
                predict_y = torch.max(outputs, dim=1)[1]
                acc += torch.eq(predict_y, val_labels.to(device)).sum().item()

                val_bar.desc = "valid epoch[{}/{}]".format(epoch + 1,
                                                           epochs)

        val_accurate = acc / val_num
        print('[epoch %d] train_loss: %.3f  val_accuracy: %.3f' %
              (epoch + 1, running_loss / train_steps, val_accurate))

        if val_accurate > best_acc:
            best_acc = val_accurate
            torch.save(net.state_dict(), save_path)

    print('Finished Training')


if __name__ == '__main__':
    main()

3 predict.py

import os
import json

import torch
from PIL import Image
from torchvision import transforms
import matplotlib.pyplot as plt

from model import resnet34


def main():
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    data_transform = transforms.Compose(
        [transforms.Resize(256),
         transforms.CenterCrop(224),
         transforms.ToTensor(),
         transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])])

    # load image
    img_path = "../tulip.jpg"
    assert os.path.exists(img_path), "file: '{}' dose not exist.".format(img_path)
    img = Image.open(img_path)
    plt.imshow(img)
    # [N, C, H, W]
    img = data_transform(img)
    # expand batch dimension
    img = torch.unsqueeze(img, dim=0)

    # read class_indict
    json_path = './class_indices.json'
    assert os.path.exists(json_path), "file: '{}' dose not exist.".format(json_path)

    with open(json_path, "r") as f:
        class_indict = json.load(f)

    # create model
    model = resnet34(num_classes=5).to(device)

    # load model weights
    weights_path = "./resNet34.pth"
    assert os.path.exists(weights_path), "file: '{}' dose not exist.".format(weights_path)
    model.load_state_dict(torch.load(weights_path, map_location=device))

    # prediction
    model.eval()
    with torch.no_grad():
        # predict class
        output = torch.squeeze(model(img.to(device))).cpu()
        predict = torch.softmax(output, dim=0)
        predict_cla = torch.argmax(predict).numpy()

    print_res = "class: {}   prob: {:.3}".format(class_indict[str(predict_cla)],
                                                 predict[predict_cla].numpy())
    plt.title(print_res)
    for i in range(len(predict)):
        print("class: {:10}   prob: {:.3}".format(class_indict[str(i)],
                                                  predict[i].numpy()))
    plt.show()


if __name__ == '__main__':
    main()

到了這里，關(guān)于ResNet網(wǎng)絡(luò)結(jié)構(gòu)詳解，網(wǎng)絡(luò)搭建，遷移學(xué)習(xí)的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！