保姆級 Keras 實現(xiàn) Faster R-CNN 十一

這篇具有很好參考價值的文章主要介紹了保姆級 Keras 實現(xiàn) Faster R-CNN 十一。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

上一篇文章中我們實現(xiàn)了 ProposalLayer 層, 它將的功能是輸出建議區(qū)域矩形. 本文要實現(xiàn)另一個自定義層 RoiPoolingLayer. 在 Faster R-CNN 中, RoiPooling 層的目的是將不同大小的感興趣區(qū)域(Region of Interest, ROI) 轉(zhuǎn)換為固定大小的特征圖作為后續(xù)步驟的輸入

一 RoI 區(qū)域

還是先把論文中的圖貼出來

保姆級 Keras 實現(xiàn) Faster R-CNN 十一,Object Detect,Keras,深度學(xué)習(xí),keras,faster_rcnn

上圖中已經(jīng)標(biāo)明了 RoI pooling 的位置, 個人覺得這張圖是有問題的. 依據(jù)如下

圖中 feature maps 的尺寸應(yīng)該遠比輸入的圖像的尺寸要小才對. 當(dāng)然這個也不是問題, 可能是為了方便作圖故意把輸入圖像畫得比較小
proposals 中的框和 RoI pooling 位置特征圖中的框一樣大. 這個是有問題的, 因為 RPN 輸出的是建議框, 是 anchor_box 經(jīng)過修正再做 NMS 后的矩形. 也是替代 Selective Search 區(qū)域的矩形. 建議框的坐標(biāo)系是原圖, 也就是說 proposals 位置的紅框的尺寸要和原圖一樣大才對. 而 RoI pooling 需要將建議框縮放到 feature maps 尺度以 feature maps 為坐標(biāo)系. 所以圖中兩處框的大小應(yīng)該是不一樣的

有了上面的解釋后, 相信理解 RoiPooling 會相對容易一點

二. 定義 RoiPoolingLyaer

Keras 自定義層的套路在保姆級 Keras 實現(xiàn) Faster R-CNN 十中已經(jīng)講過了, 這里就不那么細致的解釋了. 不完全定義如下, 后面慢慢補全

class RoiPoolingLayer(Layer):
    def __init__(self, pool_size = (7, 7), **kwargs):
        self.pool_size = pool_size
        super(RoiPoolingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        super(RoiPoolingLayer, self).build(input_shape)

    def call(self, inputs):
        pass
        
    def compute_output_shape(self, input_shape):
        pass

在上面的定義中, 需要一個初始化參數(shù) pool_size, 指明我們需要將輸出變形到什么樣的尺寸. 默認是 $(7, 7)$ , 你要喜歡其他數(shù)字也可以

1. call 函數(shù)

我們要在 call 函數(shù)中實現(xiàn) RoI pooling 的功能. 不用那么復(fù)雜, 再弄簡單一點, 只需要一個裁切 + 變形縮放的功能

先秀代碼, 下面再解釋

def call(self, inputs):
    images, features, rois = inputs
    image_shape = tf.shape(images)[1: 3]
    feature_shape = tf.shape(features)
    roi_shape = tf.shape(rois)
    
    batch_size = feature_shape[0]
    num_rois = roi_shape[1]
    feature_channels = feature_shape[3]
    
    y_scale = 1.0 / tf.cast(image_shape[0] - 1, dtype = tf.float32)
    x_scale = 1.0 / tf.cast(image_shape[1] - 1, dtype = tf.float32)
    
    y1 = rois[..., 0] * y_scale
    x1 = rois[..., 1] * x_scale
    y2 = rois[..., 2] * y_scale
    x2 = rois[..., 3] * x_scale
    
    rois = tf.stack([y1, x1, y2, x2], axis = -1)
    
    # 為每個 roi 分配對應(yīng) feature 的索引序號
    indices = tf.range(batch_size, dtype = tf.int32)
    indices = tf.repeat(indices, num_rois, axis = -1)
    
    rois = tf.reshape(rois, (-1, roi_shape[-1]))

    crops = tf.image.crop_and_resize(image = features,
                                     boxes = rois,
                                     box_indices = indices,
                                     crop_size = self.pool_size,
                                     method = "bilinear")
    
    crops = tf.reshape(crops,
                       (batch_size, num_rois,
                        self.pool_size[0], self.pool_size[1], feature_channels))
    
    return crops

對于變量的定義, 從名字就可以理解其意思. inputs 是一個列表, 有三個元素, 一個是原圖, 二是特征圖, 三是建議框. 這樣的話, 就可以拆分成 image, feature_map, rois

那為什么需要 image 這個參數(shù)呢, 有了這個參數(shù)就可以動態(tài)的獲取輸入圖像的尺寸. 從而適應(yīng)輸入圖像大小變化的情況. 還有一個主要的原因是要將建議框縮小到特征圖的尺度, 需要計算一個縮小的倍數(shù), 在代碼中有兩個倍數(shù), 分別是 y_scale 與 x_scale

兩個計算式都有在圖像尺寸上減 1, 這是為什么?

因為我們要將建議框坐標(biāo)歸一化到 $[0, 1]$ 的范圍, 從而在特征圖上的坐標(biāo)也是 $[0, 1]$ 的范圍. 這樣并不能解釋為什么要減 1. 舉個具體數(shù)字的例子, 假設(shè)輸入圖像的尺寸是 $(350, 400)$ , 有一個建議框的坐標(biāo)是 $(200, 349, 300, 399)$ , 坐標(biāo)順序是 $y_1, x_1, y_2, x_2)$ , 因為坐標(biāo)是從 0 開始的, 所以最大坐標(biāo)到不了 350 和 400. 那歸一化后最大坐標(biāo)就不能取到 1. 將圖像尺寸減 1 后, 最大坐標(biāo)就是 349 與 399, 這樣就可以取到 $[0, 1]$ 范圍

代碼中將建議框各坐標(biāo)乘以相應(yīng)的縮小的倍數(shù)怎么可以將建議框坐標(biāo)縮小到特征圖的尺度并且還是 $[0, 1]$ 的范圍呢呢, 也是一樣用剛才的例子

縮小倍數(shù):
$\begin{aligned} y_{scale} = 1 / 349 = 0.0028653 \\ x_{scale} = 1 / 399 = 0.0025062 \end{aligned}$
在原圖上的歸一化坐標(biāo):

$\begin{aligned} y_1 = 200 * y_{scale} = 200 * 0.0028653 = 0.57306590 \\ y_2 = 349 * y_{scale} = 349 * 0.0028653 = 0.99999999 \\ \\ x_1 = 300 * x_{scale} = 300 * 0.0025062 = 0.75187969 \\ x_2 = 399 * x_{scale} = 399 * 0.0025062 = 0.99999999 \\ \end{aligned}$

特征圖相對于原圖縮小了 16 倍, 所以要計算建議框在特征圖上映射的坐標(biāo)(此時還沒有歸一化), 可以按下面的計算式

$\begin{aligned} y_1 = 200 // 16 = 12 \\ y_2 = 349 // 16 = 21 \\ \\ x_1 = 300 // 16 = 18 \\ x_2 = 399 // 16 = 24 \\ \end{aligned}$

現(xiàn)在將其歸一化, 在此之前先要計算特征圖的尺寸, 這個也簡單

$\begin{aligned} h = 350 // 16 = 21 \\ w = 400 // 16 = 25 \\ \end{aligned}$

歸一化的坐標(biāo)如下

$\begin{aligned} y_1 = 12 / 21 = 0.57142857 \\ y_2 = 21 / 21 = 1.00000000 \\ \\ x_1 = 18 / 25 = 0.72000000 \\ x_2 = 24 / 25 = 0.96000000 \\ \end{aligned}$

和在原圖歸一化后的坐標(biāo)相比, 是很接近了, 誤差源于原圖不是 16 的整數(shù)倍, 會有舍入誤差

為什么要將坐標(biāo)歸一化, 原來的坐標(biāo)不好嗎?

原來的坐標(biāo)也不是不好, 只是不方便函數(shù)并行統(tǒng)一的操作. 還有一個根本的原因是我們要使用 TensorFlow 提供的函數(shù) tf.image.crop_and_resize, 這個函數(shù)的參數(shù)就是這樣規(guī)定的, 你不按規(guī)定來就得不到正確的結(jié)果

既然提到了 tf.image.crop_and_resize, 就有必要解釋一下函數(shù)的各個參數(shù). 函數(shù)原型如下

tf.image.crop_and_resize(
    image,
    boxes,
    box_indices,
    crop_size,
    method = "bilinear",
    extrapolation_value = 0.0,
    name = None
)

image: 輸入圖像, 這里是特征圖, 形狀為 [batch_size, height, width, channels]
boxes: 一個浮點型的 Tensor, 形狀為 [num_boxes, 4], 表示每個 RoI 區(qū)域的邊界框坐標(biāo). 每個邊界框的坐標(biāo)是一個四元組 $y_1, x_1, y_2, x_2)$ , 其中 $y_1, x_1)$ 是左上角的坐標(biāo), $y_2, x_2)$ 是右下角的坐標(biāo). 坐標(biāo)值應(yīng)在 0 到 1 之間
box_indices: 一個整型的 Tensor, 形狀為 [num_boxes], 表示每個 RoI 區(qū)域所屬的樣本索引, 也就是當(dāng)前的 RoI 區(qū)域?qū)?yīng)一個 batch 中的哪一張圖像(在這里是特征圖). 一個 RoI 區(qū)域就要對應(yīng)一個索引. 再說白一點, 就是告訴模型, 對于當(dāng)前的這個建議框, 你要去哪張圖上面將其摳出來
crop_size: 一個整型的元組, 表示裁剪后的大小, 形狀為 [crop_height, crop_width]
method: 縮放時的插值方式
extrapolation_value: 一個浮點數(shù), 表示當(dāng)裁剪的位置超出輸入圖像范圍(也就是坐標(biāo)值大于了圖像尺寸)時, 使用的填充值. 默認值為 0. 比如特征圖的尺寸是 $(18, 25)$ , 你要裁切的矩形是 $(14, 19, 15, 26)$ , 那超過特征圖的那些位置就要填充
name: 操作的名稱

理解了各參數(shù)的意義之后, 上面的代碼就容易理解了, 可能有一點蒙的是下面這一段代碼

# 為每個 roi 分配對應(yīng) feature 的索引序號
indices = tf.range(batch_size, dtype = tf.int32)
indices = tf.repeat(indices, num_rois, axis = -1)

rois = tf.reshape(rois, (-1, roi_shape[-1]))

這一段的功能是為每個 roi 分配對應(yīng) feature 的索引序號, ProposalLyaer 輸出的建議框的坐標(biāo), 形狀是 [batch_size, num_rois, 4], 這些建議框個數(shù)在一個 batch 內(nèi)的圖像之間是平均分配的. 0 ~ num_rois - 1 的序號對就第一張圖, num_rois ~ 2 * num_rois - 1 對應(yīng)第二張圖, 這樣類推下去

indices = tf.range(batch_size, dtype = tf.int32): 產(chǎn)生 0 ~ batch_size - 1 的序列, 比如 batch 為 4, 那序列就是 $[0, 1, 2, 3]$ . 表示建議框分別對應(yīng)的圖像索引有 0, 1, 2, 3 四張
indices = tf.repeat(indices, num_rois, -1): 將 0, 1, 2, 3 這些數(shù)字重復(fù), 一個序號重復(fù) num_rois 次, 這樣就為每一個建議框分配了一個對應(yīng)于 batch 內(nèi)特征圖的索引序號, 重復(fù)后的形式為 $[0, 0, 0, ..., 0, 0, 0, 1, 1, 1, ..., 1, 1, 1, 2, 2, 2, ..., 2, 2, 2, 3, 3, 3, ..., 3, 3, 3]$ . 這是對應(yīng)于有規(guī)律的情況, 沒有規(guī)律的話, 你也可以手動指定, 比如 $[0, 1, 2, 1, 1, 2, ..., 3, 1, 2]$ 這樣的. 也不要求各序號數(shù)量要相等
rois = tf.reshape(rois, (-1, roi_shape[-1])): 將 rois 的形狀從 [batch_size, num_rois, 4] 變成 tf.image.crop_and_resize 需要的 [num_boxes, 4]

經(jīng)過上面的一頓操作, tf.image.crop_and_resize 就能正常使用了, 實現(xiàn)了從特征圖中將建議框?qū)?yīng)的地方摳出來, 變形到 $(7, 7)$ 的形狀, 最后一句

crops = tf.reshape(crops,
                   (batch_size, num_rois,
                    self.pool_size[0], self.pool_size[1], feature_channels))

將輸出變到能做到 batch 操作的形狀

2. compute_output_shape 函數(shù)

這個就比較容易了, 指定輸出的形狀

def compute_output_shape(self, input_shape):
    image_shape, feature_shape, roi_shape = input_shape
    batch_size = image_shape[0]
    num_rois = roi_shape[1]
    feature_channels = feature_shape[3]
    
    return (batch_size, num_rois, self.pool_size[0], self.pool_size[1], feature_channels)

這樣 RoiPoolingLayer 就完成了, 完整的定義如下

# 定義 RoiPoolingLayer
class RoiPoolingLayer(Layer):
    def __init__(self, pool_size = (7, 7), **kwargs):
        self.pool_size = pool_size
        super(RoiPoolingLayer, self).__init__(**kwargs)

    def build(self, input_shape):
        super(RoiPoolingLayer, self).build(input_shape)

    def call(self, inputs):
        images, features, rois = inputs
        image_shape = tf.shape(images)[1: 3]
        feature_shape = tf.shape(features)
        roi_shape = tf.shape(rois)
        
        batch_size = feature_shape[0]
        num_rois = roi_shape[1]
        feature_channels = feature_shape[3]
        
        y_scale = 1.0 / tf.cast(image_shape[0] - 1, dtype = tf.float32)
        x_scale = 1.0 / tf.cast(image_shape[1] - 1, dtype = tf.float32)
        
        y1 = rois[..., 0] * y_scale
        x1 = rois[..., 1] * x_scale
        y2 = rois[..., 2] * y_scale
        x2 = rois[..., 3] * x_scale
        
        rois = tf.stack([y1, x1, y2, x2], axis = -1)
        
        # 為每個 roi 分配對應(yīng) feature 的索引序號
        indices = tf.range(batch_size, dtype = tf.int32)
        indices = tf.repeat(indices, num_rois, axis = -1)
        
        rois = tf.reshape(rois, (-1, roi_shape[-1]))

        crops = tf.image.crop_and_resize(image = features,
                                         boxes = rois,
                                         box_indices = indices,
                                         crop_size = self.pool_size,
                                         method = "bilinear")
        
        crops = tf.reshape(crops,
                           (batch_size, num_rois,
                            self.pool_size[0], self.pool_size[1], feature_channels))
        
        return crops
    
    def compute_output_shape(self, input_shape):
        image_shape, feature_shape, roi_shape = input_shape
        batch_size = image_shape[0]
        num_rois = roi_shape[1]
        feature_channels = feature_shape[3]
        
        return (batch_size, num_rois, self.pool_size[0], self.pool_size[1], feature_channels)

三. 將 RoiPoolingLayer 加入模型

現(xiàn)在把 RoiPoolingLayer 加入到模型如下

# RoiPooling 模型
x = keras.layers.Input(shape = (None, None, 3), name = "input")

feature = vgg16_conv(x)
rpn_cls, rpn_reg = rpn(feature)

proposal = ProposalLayer(base_anchors, num_rois = TRAIN_NUM, iou_thres = 0.7,
                         name = "proposal")([x, rpn_cls, rpn_reg])

roi_pooling = RoiPoolingLayer(name = "roi_pooling")([x, feature, proposal])

roi_pooling_model = keras.Model(x, roi_pooling, name = "roi_pooling_model")

roi_pooling_model.summary()

有了模型, 就可以測試一下效果了, 不過在之前, 要加載保姆級 Keras 實現(xiàn) Faster R-CNN 八訓(xùn)練好的參數(shù)

# 加載訓(xùn)練好的參數(shù)
roi_pooling_model.load_weights(osp.join(log_path, "faster_rcnn_weights.h5"), True)

再定義一個預(yù)測函數(shù)

# roi_pooling 模型預(yù)測
# 一次預(yù)測一張圖像
# x: 輸入圖像或圖像路徑
# 返回值: 返回原圖像和預(yù)測結(jié)果
def roi_pooling_predict(x):
    # 如果是圖像路徑, 那要將圖像預(yù)處理成網(wǎng)絡(luò)輸入格式
    # 如果不是則是 input_reader 返回的圖像, 已經(jīng)滿足輸入格式
    if isinstance(x, str):
        img_src = cv.imread(x)
        img_new, scale = new_size_image(img_src, SHORT_SIZE)
        x = [img_new]
        x = np.array(x).astype(np.float32) / 255.0
        
    y = roi_pooling_model.predict(x)
    
    return y

# 利用訓(xùn)練時劃分的測試集
test_reader = input_reader(test_set, CATEGORIES, batch_size = 4, train_mode = False)

接下來就是見證奇跡的時刻了

# roi_pooling 測試
x, y = next(test_reader)
outputs = roi_pooling_predict(x)
print(x.shape, outputs.shape)
print(outputs)

輸出如下

(4, 325, 400, 3) (4, 256, 7, 7, 512)
[[[[[0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
     8.52627680e-03 0.00000000e+00]
    [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
     3.18351114e-04 0.00000000e+00]
    [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
     9.16954782e-03 0.00000000e+00]
    ...
    [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
     2.82486826e-02 0.00000000e+00]
    [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
     3.77882309e-02 0.00000000e+00]
    [0.00000000e+00 0.00000000e+00 0.00000000e+00 ... 0.00000000e+00
     3.84687856e-02 0.00000000e+00]]