国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

改進YOLOv8系列:即插即用新的注意力機制RFAConv

這篇具有很好參考價值的文章主要介紹了改進YOLOv8系列:即插即用新的注意力機制RFAConv。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方,請大家不吝賜教,您也可以點擊"舉報違法"按鈕提交疑問。

一、 前言

空間注意力已被廣泛用于提高卷積神經(jīng)網(wǎng)絡(luò)的性能,使其能夠?qū)W⒂谥匾畔ⅰH欢?,它有一定的局限性。在本文中,我們對空間注意的有效性提出了一個新的觀點,那就是它可以解決卷積核參數(shù)共享的問題。盡管如此,由空間注意產(chǎn)生的注意圖中所包含的信息對于大尺寸卷積核來說是不夠的。因此,我們引入了一種新的注意機制,稱為感受場注意(RFA)。雖然以前的注意機制,如卷積塊注意模塊(CBAM)和協(xié)調(diào)注意(CA)只關(guān)注空間特征,它們不能完全解決卷積核參數(shù)共享的問題。相比之下,RFA不僅關(guān)注感受野空間特征,而且還為大尺寸卷積核提供有效的注意力權(quán)重。由RFA開發(fā)的感受野注意卷積操作(RFAConv)代表了一種取代標準卷積操作的新方法。它提供了幾乎可以忽略不計的計算成本和參數(shù)的增加,同時顯著提高了網(wǎng)絡(luò)性能。我們在ImageNet-1k、MS COCO和VOC數(shù)據(jù)集上進行了一系列的實驗,證明了我們的方法在各種任務(wù)中的優(yōu)越性,包括分類、物體檢測和語義分割。特別重要的是,我們認為現(xiàn)在是時候?qū)⒅攸c從空間特征轉(zhuǎn)移到當前空間注意機制的接受場空間特征上了。通過這樣做,我們可以進一步提高網(wǎng)絡(luò)性能,取得更好的結(jié)果。

1. 解決問題

通過研究卷積運算的內(nèi)在限制和注意力機制的特性注意機制,我們認為,雖然目前的空間注意機制已經(jīng)從根本上解決了卷積運算中的參數(shù)共享問題、但它仍然局限于對空間特征的識別。目前的空間注意機制并沒有完全解決較大的卷積運算的參數(shù)共享問題。內(nèi)核。此外,它們無法強調(diào)每個特征在接受領(lǐng)域中的重要性。如現(xiàn)有的卷積塊注意模塊(CBAM)[17]和協(xié)調(diào)注意(CA)[18]。因此,我們引入了一種新的感受野注意機制(RFA),全面解決了卷積核的參數(shù)共享問題。
卷積核的參數(shù)共享問題,并考慮到每個特征在感受野中的重要性。場的重要性。RFA設(shè)計的卷積操作(RFAConv)是一種突破性的方法
它可以取代目前神經(jīng)網(wǎng)絡(luò)中的標準卷積操作。只需額外的幾個參數(shù)和計算開銷,RFAConv就能提高網(wǎng)絡(luò)性能。
RFAConv: Innovating Spatital Attention and Standard Convolutional Operation

2.RFAConv原理

最近的研究表明,交互信息可以提高網(wǎng)絡(luò)性能、如[40, 41, 42]所示。同樣地,對于,RFAConv來說,交互接受場特征信息來學習注意力圖,可以提高網(wǎng)絡(luò)性能。然而,與每個感受野特征進行交互會導(dǎo)致額外的計算開銷。為了盡量減少計算開銷和參數(shù)數(shù)量為了最大限度地減少計算開銷和參數(shù)數(shù)量,
AvgPool被用來匯總每個接收場特征的全局信息。每個感受野特征的全局信息。然后,使用1×1組卷積運算來交互信息。最后,我們使用softmax來強調(diào)重要性。
改進YOLOv8系列:即插即用新的注意力機制RFAConv
改進YOLOv8系列:即插即用新的注意力機制RFAConv

二、添加方法

#RFA exp start********************************

class CAConv(nn.Module):
    def __init__(self, inp, oup, kernel_size, stride, reduction=32):
        super(CAConv, self).__init__()
        self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
        self.pool_w = nn.AdaptiveAvgPool2d((1, None))

        mip = max(8, inp // reduction)

        self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)
        self.bn1 = nn.BatchNorm2d(mip)
        self.act = h_swish()

        self.conv_h = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
        self.conv_w = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
        self.conv = nn.Sequential(nn.Conv2d(inp, oup, kernel_size, padding=kernel_size // 2, stride=stride),
                                  nn.BatchNorm2d(oup),
                                  nn.ReLU())

    def forward(self, x):
        identity = x

        n, c, h, w = x.size()
        x_h = self.pool_h(x)
        x_w = self.pool_w(x).permute(0, 1, 3, 2)

        y = torch.cat([x_h, x_w], dim=2)
        y = self.conv1(y)
        y = self.bn1(y)
        y = self.act(y)

        x_h, x_w = torch.split(y, [h, w], dim=2)
        x_w = x_w.permute(0, 1, 3, 2)

        a_h = self.conv_h(x_h).sigmoid()
        a_w = self.conv_w(x_w).sigmoid()

        out = identity * a_w * a_h

        return self.conv(out)
class CBAMConv(nn.Module):
    def __init__(self, channel, out_channel, kernel_size, stride, reduction=16, spatial_kernel=7):
        super().__init__()

        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.mlp = nn.Sequential(
            nn.Conv2d(channel, channel // reduction, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(channel // reduction, channel, 1, bias=False)
        )

        self.spatital = nn.Conv2d(2, 1, kernel_size=spatial_kernel,
                                  padding=spatial_kernel // 2, bias=False)
        self.sigmoid = nn.Sigmoid()

        self.conv = nn.Sequential(nn.Conv2d(channel, out_channel, kernel_size, padding=kernel_size // 2, stride=stride),
                                  nn.BatchNorm2d(out_channel),
                                  nn.ReLU())

    def forward(self, x):
        max_out = self.mlp(self.max_pool(x))
        avg_out = self.mlp(self.avg_pool(x))
        channel_out = self.sigmoid(max_out + avg_out)
        x = channel_out * x

        max_out, _ = torch.max(x, dim=1, keepdim=True)
        avg_out = torch.mean(x, dim=1, keepdim=True)
        spatial_out = self.sigmoid(self.spatital(torch.cat([max_out, avg_out], dim=1)))
        x = spatial_out * x
        return self.conv(x)


class CAMConv(nn.Module):
    def __init__(self, channel, out_channel, kernel_size, stride, reduction=16, spatial_kernel=7):
        super().__init__()

        self.max_pool = nn.AdaptiveMaxPool2d(1)
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.mlp = nn.Sequential(
            nn.Conv2d(channel, channel // reduction, 1, bias=False),
            nn.ReLU(inplace=True),
            nn.Conv2d(channel // reduction, channel, 1, bias=False)
        )
        self.sigmoid = nn.Sigmoid()
        self.conv = nn.Sequential(nn.Conv2d(channel, out_channel, kernel_size, padding=kernel_size // 2, stride=stride),
                                  nn.BatchNorm2d(out_channel),
                                  nn.ReLU())

    def forward(self, x):
        max_out = self.mlp(self.max_pool(x))
        avg_out = self.mlp(self.avg_pool(x))
        channel_out = self.sigmoid(max_out + avg_out)
        x = channel_out * x
        return self.conv(x)
#RFA exp start********************************

v5yaml文件

# YOLOv5 ?? by Ultralytics, GPL-3.0 license

# Parameters
nc: 80  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

# YOLOv5 v6.0 backbone
backbone:
  # [from, number, module, args]
  [[-1, 1, Conv, [64, 6, 2, 2]],  # 0-P1/2
   [-1, 1, CAConv, [128, 3, 2]],  # 1-P2/4
   [-1, 3, C3, [128]],
   [-1, 1, CAConv, [256, 3, 2]],  # 3-P3/8
   [-1, 6, C3, [256]],
   [-1, 1, CAConv, [512, 3, 2]],  # 5-P4/16
   [-1, 9, C3, [512]],
   [-1, 1, CAConv, [1024, 3, 2]],  # 7-P5/32
   [-1, 3, C3, [1024]],
   [-1, 1, SPPF, [1024, 5]],  # 9
  ]

# YOLOv5 v6.0 head
head:
  [[-1, 1, CAConv, [512, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 6], 1, Concat, [1]],  # cat backbone P4
   [-1, 3, C3, [512, False]],  # 13

   [-1, 1, CAConv, [256, 1, 1]],
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],
   [[-1, 4], 1, Concat, [1]],  # cat backbone P3
   [-1, 3, C3, [256, False]],  # 17 (P3/8-small)

   [-1, 1, CAConv, [256, 3, 2]],
   [[-1, 14], 1, Concat, [1]],  # cat head P4
   [-1, 3, C3, [512, False]],  # 20 (P4/16-medium)

   [-1, 1, CAConv, [512, 3, 2]],
   [[-1, 10], 1, Concat, [1]],  # cat head P5
   [-1, 3, C3, [1024, False]],  # 23 (P5/32-large)

   [[17, 20, 23], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

代碼

自己實現(xiàn)一個的版本,好像不太對,知識有限,希望大佬指出錯誤
改進YOLOv8系列:即插即用新的注意力機制RFAConv

class RFCAConv(nn.Module):
    def __init__(self, c1, c2, kernel_size, stride):
        super(RFCAConv, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.group_conv1 = Conv_L(c1, 9 *c1, k=1, g=c1)
        self.group_conv2 = Conv_L(c1, 9 *c1, k=3, g=c1)
        self.group_conv3 = Conv_L(c1, 9 *c1, k=5, g=c1)

        self.softmax = nn.Softmax(dim=1)

        self.group_conv = Conv(c1, 9 * c1, k=3, g=c1)
        self.convDown = Conv(c1, c1, k=3, s=3)
        self.CA = CAConv(c1, c2, kernel_size, stride)
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x)

        group1 = self.softmax(self.group_conv1(y))
        group2 = self.softmax(self.group_conv2(y))
        group3 = self.softmax(self.group_conv3(y))
        # g1 =  torch.cat([group1, group2, group3], dim=1)

        g2 = self.group_conv(x)

        out1 = g2 * group1.expand_as(g2)
        out2 = g2 * group2.expand_as(g2)
        out3 = g2 * group3.expand_as(g2)

        out = sum([out1, out2, out3])
        # 獲取輸入特征圖的形狀
        batch_size, channels, height, width = out.shape

        # 計算輸出特征圖的通道數(shù)
        output_channels = channels // 9

        # 重塑并轉(zhuǎn)置特征圖以將通道數(shù)分成3x3個子通道并擴展高度和寬度
        out = out.view(batch_size, output_channels, 3, 3, height, width).permute(0, 1, 4, 2, 5,3).\
                                                reshape(batch_size, output_channels, 3 * height, 3 * width)
        out = self.convDown(out)
        out = self.CA(out)
        return out

重新修改了以下,

class RFCAConv2(nn.Module):
    def __init__(self, c1, c2, kernel_size, stride):
        super(RFCAConv2, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.group_conv1 = Conv_L(c1, 3 *c1, k=1, g=c1)
        self.group_conv2 = Conv_L(c1, 3 *c1, k=3, g=c1)
        self.group_conv3 = Conv_L(c1, 3 *c1, k=5, g=c1)

        self.softmax = nn.Softmax(dim=1)

        self.group_conv = Conv(c1, 3 * c1, k=3, g=c1)
        self.convDown = Conv(c1, c1, k=3, s=3,g=c1)
        self.CA = CAConv(c1, c2, kernel_size, stride)
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x)

        group1 = self.softmax(self.group_conv1(y))
        group2 = self.softmax(self.group_conv2(y))
        group3 = self.softmax(self.group_conv3(y))
        # g1 =  torch.cat([group1, group2, group3], dim=1)

        g1 = self.group_conv(x)
        # g2 = self.group_conv(x)
        # g3 = self.group_conv(x)

        out1 = g1 * group1
        out2 = g1 * group2
        out3 = g1 * group3

        out =torch.cat([out1, out2, out3],dim=1)
        # 獲取輸入特征圖的形狀
        batch_size, channels, height, width = out.shape

        # 計算輸出特征圖的通道數(shù)
        output_channels = c

        # 重塑并轉(zhuǎn)置特征圖以將通道數(shù)分成3x3個子通道并擴展高度和寬度
        out = out.view(batch_size, output_channels, 3, 3, height, width).permute(0, 1, 4, 2, 5, 3).\
                                                reshape(batch_size, output_channels, 3 * height, 3 * width)
        # out = out.view(batch_size, output_channels, height*3, width*3)
        out = self.convDown(out)
        out = self.CA(out)
        return out

改進YOLOv8系列:即插即用新的注意力機制RFAConv文章來源地址http://www.zghlxwxcb.cn/news/detail-420134.html

官方RFAconv代碼

import torch
from torch import nn
from einops import rearrange

class RFAConv(nn.Module): # 基于Group Conv實現(xiàn)的RFAConv
    def __init__(self,in_channel,out_channel,kernel_size,stride=1):
        super().__init__()
        self.kernel_size = kernel_size

        self.get_weight = nn.Sequential(nn.AvgPool2d(kernel_size=kernel_size, padding=kernel_size // 2, stride=stride),
                                        nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=1, groups=in_channel,bias=False))
        self.generate_feature = nn.Sequential(
            nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=kernel_size,padding=kernel_size//2,stride=stride, groups=in_channel, bias=False),
            nn.BatchNorm2d(in_channel * (kernel_size ** 2)),
            nn.ReLU())
       
        self.conv = nn.Sequential(nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=kernel_size),
                                  nn.BatchNorm2d(out_channel),
                                  nn.ReLU())

    def forward(self,x):
        b,c = x.shape[0:2]
        weight =  self.get_weight(x)
        h,w = weight.shape[2:]
        weighted = weight.view(b, c, self.kernel_size ** 2, h, w).softmax(2)  # b c*kernel**2,h,w ->  b c k**2 h w 
        feature = self.generate_feature(x).view(b, c, self.kernel_size ** 2, h, w)  #b c*kernel**2,h,w ->  b c k**2 h w   獲得感受野空間特征
        weighted_data = feature * weighted
        conv_data = rearrange(weighted_data, 'b c (n1 n2) h w -> b c (h n1) (w n2)', n1=self.kernel_size, # b c k**2 h w ->  b c h*k w*k
                              n2=self.kernel_size)
        return self.conv(conv_data)

到了這里,關(guān)于改進YOLOv8系列:即插即用新的注意力機制RFAConv的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!

本文來自互聯(lián)網(wǎng)用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務(wù),不擁有所有權(quán),不承擔相關(guān)法律責任。如若轉(zhuǎn)載,請注明出處: 如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符,請點擊違法舉報進行投訴反饋,一經(jīng)查實,立即刪除!

領(lǐng)支付寶紅包贊助服務(wù)器費用

相關(guān)文章

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請作者喝杯咖啡吧~博客贊助

支付寶掃一掃領(lǐng)取紅包,優(yōu)惠每天領(lǐng)

二維碼1

領(lǐng)取紅包

二維碼2

領(lǐng)紅包