一、 前言
空間注意力已被廣泛用于提高卷積神經(jīng)網(wǎng)絡(luò)的性能,使其能夠?qū)W⒂谥匾畔ⅰH欢?,它有一定的局限性。在本文中,我們對空間注意的有效性提出了一個新的觀點,那就是它可以解決卷積核參數(shù)共享的問題。盡管如此,由空間注意產(chǎn)生的注意圖中所包含的信息對于大尺寸卷積核來說是不夠的。因此,我們引入了一種新的注意機制,稱為感受場注意(RFA)。雖然以前的注意機制,如卷積塊注意模塊(CBAM)和協(xié)調(diào)注意(CA)只關(guān)注空間特征,它們不能完全解決卷積核參數(shù)共享的問題。相比之下,RFA不僅關(guān)注感受野空間特征,而且還為大尺寸卷積核提供有效的注意力權(quán)重。由RFA開發(fā)的感受野注意卷積操作(RFAConv)代表了一種取代標準卷積操作的新方法。它提供了幾乎可以忽略不計的計算成本和參數(shù)的增加,同時顯著提高了網(wǎng)絡(luò)性能。我們在ImageNet-1k、MS COCO和VOC數(shù)據(jù)集上進行了一系列的實驗,證明了我們的方法在各種任務(wù)中的優(yōu)越性,包括分類、物體檢測和語義分割。特別重要的是,我們認為現(xiàn)在是時候?qū)⒅攸c從空間特征轉(zhuǎn)移到當前空間注意機制的接受場空間特征上了。通過這樣做,我們可以進一步提高網(wǎng)絡(luò)性能,取得更好的結(jié)果。
1. 解決問題
通過研究卷積運算的內(nèi)在限制和注意力機制的特性注意機制,我們認為,雖然目前的空間注意機制已經(jīng)從根本上解決了卷積運算中的參數(shù)共享問題、但它仍然局限于對空間特征的識別。目前的空間注意機制并沒有完全解決較大的卷積運算的參數(shù)共享問題。內(nèi)核。此外,它們無法強調(diào)每個特征在接受領(lǐng)域中的重要性。如現(xiàn)有的卷積塊注意模塊(CBAM)[17]和協(xié)調(diào)注意(CA)[18]。因此,我們引入了一種新的感受野注意機制(RFA),全面解決了卷積核的參數(shù)共享問題。
卷積核的參數(shù)共享問題,并考慮到每個特征在感受野中的重要性。場的重要性。RFA設(shè)計的卷積操作(RFAConv)是一種突破性的方法
它可以取代目前神經(jīng)網(wǎng)絡(luò)中的標準卷積操作。只需額外的幾個參數(shù)和計算開銷,RFAConv就能提高網(wǎng)絡(luò)性能。
RFAConv: Innovating Spatital Attention and Standard Convolutional Operation
2.RFAConv原理
最近的研究表明,交互信息可以提高網(wǎng)絡(luò)性能、如[40, 41, 42]所示。同樣地,對于,RFAConv來說,交互接受場特征信息來學習注意力圖,可以提高網(wǎng)絡(luò)性能。然而,與每個感受野特征進行交互會導(dǎo)致額外的計算開銷。為了盡量減少計算開銷和參數(shù)數(shù)量為了最大限度地減少計算開銷和參數(shù)數(shù)量,
AvgPool被用來匯總每個接收場特征的全局信息。每個感受野特征的全局信息。然后,使用1×1組卷積運算來交互信息。最后,我們使用softmax來強調(diào)重要性。
二、添加方法
#RFA exp start********************************
class CAConv(nn.Module):
def __init__(self, inp, oup, kernel_size, stride, reduction=32):
super(CAConv, self).__init__()
self.pool_h = nn.AdaptiveAvgPool2d((None, 1))
self.pool_w = nn.AdaptiveAvgPool2d((1, None))
mip = max(8, inp // reduction)
self.conv1 = nn.Conv2d(inp, mip, kernel_size=1, stride=1, padding=0)
self.bn1 = nn.BatchNorm2d(mip)
self.act = h_swish()
self.conv_h = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
self.conv_w = nn.Conv2d(mip, inp, kernel_size=1, stride=1, padding=0)
self.conv = nn.Sequential(nn.Conv2d(inp, oup, kernel_size, padding=kernel_size // 2, stride=stride),
nn.BatchNorm2d(oup),
nn.ReLU())
def forward(self, x):
identity = x
n, c, h, w = x.size()
x_h = self.pool_h(x)
x_w = self.pool_w(x).permute(0, 1, 3, 2)
y = torch.cat([x_h, x_w], dim=2)
y = self.conv1(y)
y = self.bn1(y)
y = self.act(y)
x_h, x_w = torch.split(y, [h, w], dim=2)
x_w = x_w.permute(0, 1, 3, 2)
a_h = self.conv_h(x_h).sigmoid()
a_w = self.conv_w(x_w).sigmoid()
out = identity * a_w * a_h
return self.conv(out)
class CBAMConv(nn.Module):
def __init__(self, channel, out_channel, kernel_size, stride, reduction=16, spatial_kernel=7):
super().__init__()
self.max_pool = nn.AdaptiveMaxPool2d(1)
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.mlp = nn.Sequential(
nn.Conv2d(channel, channel // reduction, 1, bias=False),
nn.ReLU(inplace=True),
nn.Conv2d(channel // reduction, channel, 1, bias=False)
)
self.spatital = nn.Conv2d(2, 1, kernel_size=spatial_kernel,
padding=spatial_kernel // 2, bias=False)
self.sigmoid = nn.Sigmoid()
self.conv = nn.Sequential(nn.Conv2d(channel, out_channel, kernel_size, padding=kernel_size // 2, stride=stride),
nn.BatchNorm2d(out_channel),
nn.ReLU())
def forward(self, x):
max_out = self.mlp(self.max_pool(x))
avg_out = self.mlp(self.avg_pool(x))
channel_out = self.sigmoid(max_out + avg_out)
x = channel_out * x
max_out, _ = torch.max(x, dim=1, keepdim=True)
avg_out = torch.mean(x, dim=1, keepdim=True)
spatial_out = self.sigmoid(self.spatital(torch.cat([max_out, avg_out], dim=1)))
x = spatial_out * x
return self.conv(x)
class CAMConv(nn.Module):
def __init__(self, channel, out_channel, kernel_size, stride, reduction=16, spatial_kernel=7):
super().__init__()
self.max_pool = nn.AdaptiveMaxPool2d(1)
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.mlp = nn.Sequential(
nn.Conv2d(channel, channel // reduction, 1, bias=False),
nn.ReLU(inplace=True),
nn.Conv2d(channel // reduction, channel, 1, bias=False)
)
self.sigmoid = nn.Sigmoid()
self.conv = nn.Sequential(nn.Conv2d(channel, out_channel, kernel_size, padding=kernel_size // 2, stride=stride),
nn.BatchNorm2d(out_channel),
nn.ReLU())
def forward(self, x):
max_out = self.mlp(self.max_pool(x))
avg_out = self.mlp(self.avg_pool(x))
channel_out = self.sigmoid(max_out + avg_out)
x = channel_out * x
return self.conv(x)
#RFA exp start********************************
v5yaml文件
# YOLOv5 ?? by Ultralytics, GPL-3.0 license
# Parameters
nc: 80 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 v6.0 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Conv, [64, 6, 2, 2]], # 0-P1/2
[-1, 1, CAConv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, CAConv, [256, 3, 2]], # 3-P3/8
[-1, 6, C3, [256]],
[-1, 1, CAConv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, CAConv, [1024, 3, 2]], # 7-P5/32
[-1, 3, C3, [1024]],
[-1, 1, SPPF, [1024, 5]], # 9
]
# YOLOv5 v6.0 head
head:
[[-1, 1, CAConv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512, False]], # 13
[-1, 1, CAConv, [256, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256, False]], # 17 (P3/8-small)
[-1, 1, CAConv, [256, 3, 2]],
[[-1, 14], 1, Concat, [1]], # cat head P4
[-1, 3, C3, [512, False]], # 20 (P4/16-medium)
[-1, 1, CAConv, [512, 3, 2]],
[[-1, 10], 1, Concat, [1]], # cat head P5
[-1, 3, C3, [1024, False]], # 23 (P5/32-large)
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
代碼
自己實現(xiàn)一個的版本,好像不太對,知識有限,希望大佬指出錯誤
class RFCAConv(nn.Module):
def __init__(self, c1, c2, kernel_size, stride):
super(RFCAConv, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.group_conv1 = Conv_L(c1, 9 *c1, k=1, g=c1)
self.group_conv2 = Conv_L(c1, 9 *c1, k=3, g=c1)
self.group_conv3 = Conv_L(c1, 9 *c1, k=5, g=c1)
self.softmax = nn.Softmax(dim=1)
self.group_conv = Conv(c1, 9 * c1, k=3, g=c1)
self.convDown = Conv(c1, c1, k=3, s=3)
self.CA = CAConv(c1, c2, kernel_size, stride)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x)
group1 = self.softmax(self.group_conv1(y))
group2 = self.softmax(self.group_conv2(y))
group3 = self.softmax(self.group_conv3(y))
# g1 = torch.cat([group1, group2, group3], dim=1)
g2 = self.group_conv(x)
out1 = g2 * group1.expand_as(g2)
out2 = g2 * group2.expand_as(g2)
out3 = g2 * group3.expand_as(g2)
out = sum([out1, out2, out3])
# 獲取輸入特征圖的形狀
batch_size, channels, height, width = out.shape
# 計算輸出特征圖的通道數(shù)
output_channels = channels // 9
# 重塑并轉(zhuǎn)置特征圖以將通道數(shù)分成3x3個子通道并擴展高度和寬度
out = out.view(batch_size, output_channels, 3, 3, height, width).permute(0, 1, 4, 2, 5,3).\
reshape(batch_size, output_channels, 3 * height, 3 * width)
out = self.convDown(out)
out = self.CA(out)
return out
重新修改了以下,文章來源:http://www.zghlxwxcb.cn/news/detail-420134.html
class RFCAConv2(nn.Module):
def __init__(self, c1, c2, kernel_size, stride):
super(RFCAConv2, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.group_conv1 = Conv_L(c1, 3 *c1, k=1, g=c1)
self.group_conv2 = Conv_L(c1, 3 *c1, k=3, g=c1)
self.group_conv3 = Conv_L(c1, 3 *c1, k=5, g=c1)
self.softmax = nn.Softmax(dim=1)
self.group_conv = Conv(c1, 3 * c1, k=3, g=c1)
self.convDown = Conv(c1, c1, k=3, s=3,g=c1)
self.CA = CAConv(c1, c2, kernel_size, stride)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x)
group1 = self.softmax(self.group_conv1(y))
group2 = self.softmax(self.group_conv2(y))
group3 = self.softmax(self.group_conv3(y))
# g1 = torch.cat([group1, group2, group3], dim=1)
g1 = self.group_conv(x)
# g2 = self.group_conv(x)
# g3 = self.group_conv(x)
out1 = g1 * group1
out2 = g1 * group2
out3 = g1 * group3
out =torch.cat([out1, out2, out3],dim=1)
# 獲取輸入特征圖的形狀
batch_size, channels, height, width = out.shape
# 計算輸出特征圖的通道數(shù)
output_channels = c
# 重塑并轉(zhuǎn)置特征圖以將通道數(shù)分成3x3個子通道并擴展高度和寬度
out = out.view(batch_size, output_channels, 3, 3, height, width).permute(0, 1, 4, 2, 5, 3).\
reshape(batch_size, output_channels, 3 * height, 3 * width)
# out = out.view(batch_size, output_channels, height*3, width*3)
out = self.convDown(out)
out = self.CA(out)
return out
文章來源地址http://www.zghlxwxcb.cn/news/detail-420134.html
官方RFAconv代碼
import torch
from torch import nn
from einops import rearrange
class RFAConv(nn.Module): # 基于Group Conv實現(xiàn)的RFAConv
def __init__(self,in_channel,out_channel,kernel_size,stride=1):
super().__init__()
self.kernel_size = kernel_size
self.get_weight = nn.Sequential(nn.AvgPool2d(kernel_size=kernel_size, padding=kernel_size // 2, stride=stride),
nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=1, groups=in_channel,bias=False))
self.generate_feature = nn.Sequential(
nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=kernel_size,padding=kernel_size//2,stride=stride, groups=in_channel, bias=False),
nn.BatchNorm2d(in_channel * (kernel_size ** 2)),
nn.ReLU())
self.conv = nn.Sequential(nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=kernel_size),
nn.BatchNorm2d(out_channel),
nn.ReLU())
def forward(self,x):
b,c = x.shape[0:2]
weight = self.get_weight(x)
h,w = weight.shape[2:]
weighted = weight.view(b, c, self.kernel_size ** 2, h, w).softmax(2) # b c*kernel**2,h,w -> b c k**2 h w
feature = self.generate_feature(x).view(b, c, self.kernel_size ** 2, h, w) #b c*kernel**2,h,w -> b c k**2 h w 獲得感受野空間特征
weighted_data = feature * weighted
conv_data = rearrange(weighted_data, 'b c (n1 n2) h w -> b c (h n1) (w n2)', n1=self.kernel_size, # b c k**2 h w -> b c h*k w*k
n2=self.kernel_size)
return self.conv(conv_data)
到了這里,關(guān)于改進YOLOv8系列:即插即用新的注意力機制RFAConv的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!