1.背景介紹
計(jì)算機(jī)視覺是人工智能領(lǐng)域的一個(gè)重要分支,其主要研究如何讓計(jì)算機(jī)理解和處理圖像和視頻。對(duì)象檢測(cè)是計(jì)算機(jī)視覺中的一個(gè)重要任務(wù),它旨在在圖像中識(shí)別和定位特定類別的物體。隨著深度學(xué)習(xí)技術(shù)的發(fā)展,對(duì)象檢測(cè)技術(shù)也逐漸被深度學(xué)習(xí)所取代,這種方法在性能和準(zhǔn)確性方面遠(yuǎn)超傳統(tǒng)方法。本文將介紹深度學(xué)習(xí)在計(jì)算機(jī)視覺中的對(duì)象檢測(cè)技術(shù),包括其核心概念、算法原理、具體操作步驟、數(shù)學(xué)模型公式、代碼實(shí)例和未來發(fā)展趨勢(shì)。
2.核心概念與聯(lián)系
在深度學(xué)習(xí)中,對(duì)象檢測(cè)通常使用卷積神經(jīng)網(wǎng)絡(luò)(CNN)作為底層的特征提取器,然后將這些特征用于目標(biāo)檢測(cè)任務(wù)。深度學(xué)習(xí)對(duì)象檢測(cè)的主要技術(shù)包括:
- 區(qū)域候選框(R-CNN):這是一個(gè)兩階段的對(duì)象檢測(cè)方法,首先生成大量的區(qū)域候選框,然后將這些候選框的特征通過一個(gè)分類器進(jìn)行分類和回歸。
- Fast R-CNN:這是R-CNN的改進(jìn)版本,通過將特征提取和候選框預(yù)測(cè)合并為一個(gè)網(wǎng)絡(luò),提高了檢測(cè)速度。
- Faster R-CNN:這是Fast R-CNN的改進(jìn)版本,通過引入?yún)^(qū)域候選網(wǎng)絡(luò)(RPN)來自動(dòng)生成候選框,進(jìn)一步提高了檢測(cè)速度和準(zhǔn)確性。
- YOLO(You Only Look Once):這是一個(gè)一階段的對(duì)象檢測(cè)方法,通過將圖像分為一定數(shù)量的網(wǎng)格單元,并在每個(gè)單元內(nèi)進(jìn)行目標(biāo)檢測(cè),提高了檢測(cè)速度。
- SSD(Single Shot MultiBox Detector):這是另一個(gè)一階段的對(duì)象檢測(cè)方法,通過在網(wǎng)絡(luò)中引入多個(gè)輸出層,實(shí)現(xiàn)不同尺度的目標(biāo)檢測(cè)。
3.核心算法原理和具體操作步驟以及數(shù)學(xué)模型公式詳細(xì)講解
3.1 R-CNN
R-CNN是一個(gè)兩階段的對(duì)象檢測(cè)方法,其主要步驟如下:
- 使用卷積神經(jīng)網(wǎng)絡(luò)(例如VGG-16)對(duì)輸入圖像進(jìn)行特征提取,得到的特征圖大小為$W \times H \times D$,其中$W$、$H$是圖像的寬和高,$D$是特征通道數(shù)。
- 生成大量的區(qū)域候選框,通常使用隨機(jī)生成或者基于圖像的邊緣等方法。
- 對(duì)每個(gè)候選框的特征進(jìn)行分類和回歸,以確定候選框是否包含目標(biāo)物體,以及目標(biāo)物體的位置和尺寸。
R-CNN的分類和回歸過程可以通過Softmax函數(shù)和回歸函數(shù)實(shí)現(xiàn)。對(duì)于每個(gè)候選框$b$,我們可以定義一個(gè)分類向量$cb$,其中$c{b,i}$表示候選框$b$中物體的類別為$i$的概率。同時(shí),我們可以定義一個(gè)回歸向量$rb$,其中$r{b,j}$表示候選框$b$的中心點(diǎn)$(x,y)$和寬度$w$、高度$h$的偏移量。這些偏移量可以通過以下公式計(jì)算:
$$ x = x0 + \sum{j=1}^4 wj \deltaj $$
$$ y = y0 + \sum{j=1}^4 hj \deltaj $$
$$ w = w0 + \sum{j=1}^4 lj \deltaj $$
$$ h = h0 + \sum{j=1}^4 mj \deltaj $$
其中$(x0, y0, w0, h0)$是候選框的初始中心點(diǎn)和尺寸,$\deltaj$是第$j$個(gè)特征通道的激活值,$wj$、$hj$、$lj$、$m_j$是相應(yīng)通道的偏移系數(shù)。
3.2 Fast R-CNN
Fast R-CNN通過將特征提取和候選框預(yù)測(cè)合并為一個(gè)網(wǎng)絡(luò),提高了檢測(cè)速度。具體步驟如下:
- 使用卷積神經(jīng)網(wǎng)絡(luò)(例如VGG-16)對(duì)輸入圖像進(jìn)行特征提取,得到的特征圖大小為$W \times H \times D$。
- 使用一個(gè)卷積核將特征圖分為多個(gè)區(qū)域,并為每個(gè)區(qū)域生成一個(gè)候選框。
- 對(duì)每個(gè)候選框的特征進(jìn)行分類和回歸,以確定候選框是否包含目標(biāo)物體,以及目標(biāo)物體的位置和尺寸。
Fast R-CNN的分類和回歸過程與R-CNN相同。
3.3 Faster R-CNN
Faster R-CNN通過引入?yún)^(qū)域候選網(wǎng)絡(luò)(RPN)來自動(dòng)生成候選框,提高了檢測(cè)速度和準(zhǔn)確性。具體步驟如下:
- 使用卷積神經(jīng)網(wǎng)絡(luò)(例如VGG-16)對(duì)輸入圖像進(jìn)行特征提取,得到的特征圖大小為$W \times H \times D$。
- 使用區(qū)域候選網(wǎng)絡(luò)(RPN)將特征圖分為多個(gè)區(qū)域,并為每個(gè)區(qū)域生成一個(gè)候選框。
- 對(duì)每個(gè)候選框的特征進(jìn)行分類和回歸,以確定候選框是否包含目標(biāo)物體,以及目標(biāo)物體的位置和尺寸。
Faster R-CNN的分類和回歸過程與R-CNN和Fast R-CNN相同。
3.4 YOLO
YOLO是一個(gè)一階段的對(duì)象檢測(cè)方法,具體步驟如下:
- 使用卷積神經(jīng)網(wǎng)絡(luò)對(duì)輸入圖像進(jìn)行特征提取,得到的特征圖大小為$W \times H \times D$。
- 將圖像分為$S$個(gè)網(wǎng)格單元,每個(gè)單元包含$B$個(gè)綁定的候選框。
- 對(duì)每個(gè)網(wǎng)格單元,使用一個(gè)分類器和$K$個(gè)回歸器進(jìn)行目標(biāo)檢測(cè),分類器用于確定單元中存在目標(biāo)物體的類別,回歸器用于確定目標(biāo)物體的位置和尺寸。
YOLO的分類和回歸過程可以通過Softmax函數(shù)和回歸函數(shù)實(shí)現(xiàn)。對(duì)于每個(gè)網(wǎng)格單元$g$,我們可以定義一個(gè)分類向量$cg$,其中$c{g,i}$表示單元$g$中物體的類別為$i$的概率。同時(shí),我們可以定義$K$個(gè)回歸向量$r^kg$,其中$r^k{g,j}$表示單元$g$中物體的中心點(diǎn)$(x,y)$和寬度$w$、高度$h$的偏移量。這些偏移量可以通過以下公式計(jì)算:
$$ x = x0 + \sum{j=1}^4 wj \deltaj $$
$$ y = y0 + \sum{j=1}^4 hj \deltaj $$
$$ w = w0 + \sum{j=1}^4 lj \deltaj $$
$$ h = h0 + \sum{j=1}^4 mj \deltaj $$
其中$(x0, y0, w0, h0)$是單元$g$的初始中心點(diǎn)和尺寸,$\deltaj$是第$j$個(gè)特征通道的激活值,$wj$、$hj$、$lj$、$m_j$是相應(yīng)通道的偏移系數(shù)。
3.5 SSD
SSD是另一個(gè)一階段的對(duì)象檢測(cè)方法,具體步驟如下:
- 使用卷積神經(jīng)網(wǎng)絡(luò)對(duì)輸入圖像進(jìn)行特征提取,得到的特征圖大小為$W \times H \times D$。
- 使用多個(gè)輸出層將特征圖分為多個(gè)區(qū)域,并為每個(gè)區(qū)域生成一個(gè)候選框。
- 對(duì)每個(gè)候選框的特征進(jìn)行分類和回歸,以確定候選框是否包含目標(biāo)物體,以及目標(biāo)物體的位置和尺寸。
SSD的分類和回歸過程與YOLO相同。
4.具體代碼實(shí)例和詳細(xì)解釋說明
在這里,我們將提供一個(gè)使用Python和TensorFlow實(shí)現(xiàn)的Faster R-CNN對(duì)象檢測(cè)示例。首先,我們需要安裝以下庫:
pip install tensorflow pip install tensorflow-object-detection-api
接下來,我們可以從TensorFlow對(duì)象檢測(cè)API中下載一個(gè)預(yù)訓(xùn)練的Faster R-CNN模型,例如SSD512:
git clone https://github.com/tensorflow/models.git cd models/research protoc object_detection/protos/*.proto -I. --python_out=. --grpc_out=.
然后,我們可以使用以下代碼加載模型并進(jìn)行對(duì)象檢測(cè):
```python import tensorflow as tf from objectdetection.utils import labelmaputil from objectdetection.utils import visualizationutils as vizutils
加載預(yù)訓(xùn)練的Faster R-CNN模型
modelpath = 'path/to/fasterrcnnresnet101v1coco.ckpt' detectiongraph = tf.Graph() with detectiongraph.asdefault(): odgraphdef = tf.compat.v1.GraphDef() with tf.io.gfile.GFile(modelpath, 'rb') as fid: serializedgraph = fid.read() odgraphdef.ParseFromString(serializedgraph) tf.importgraphdef(odgraph_def, name='')
sess = tf.compat.v1.Session(graph=detection_graph)
sess.run([tf.compat.v1.global_variables_initializer(),
tf.compat.v1.local_variables_initializer()])
tf.import_graph_def(od_graph_def, name='')
加載類別名稱和顏色
labelmappath = 'path/to/labelmap.pbtxt' categoryindex = labelmaputil.createcategoryindexfromlabelmap(labelmappath, usedisplayname=True) colormap = [(i, (random.getrandbits(8) % 255, random.getrandbits(8) % 255, random.getrandbits(8) % 255)) for i in categoryindex.keys()]
讀取圖像并進(jìn)行對(duì)象檢測(cè)
imagetensor = detectiongraph.gettensorbyname('imagetensor:0') detectionboxes = detectiongraph.gettensorbyname('detectionboxes:0') detectionscores = detectiongraph.gettensorbyname('detectionscores:0') detectionclasses = detectiongraph.gettensorbyname('detectionclasses:0') numdetectionclasses = detectiongraph.gettensorbyname('numdetectionclasses:0')
imagenp = np.expanddims(imagenp, axis=0) imagetensornp = detectiongraph.gettensorbyname('imagetensor:0') (boxes, scores, classes, numdetections) = sess.run( [detectionboxes, detectionscores, detectionclasses, numdetectionclasses], feeddict={imagetensor: imagetensornp})
繪制檢測(cè)結(jié)果
vizboxes = vizutils.visualizeboxesandlabelsonimagearray( imagenp, np.squeeze(boxes), np.squeeze(classes).astype(np.int32), np.squeeze(scores), categoryindex, usenormalizedcoordinates=True, maxboxestodraw=200, minscorethresh=.30, agnosticmode=False)
plt.imshow(vizboxes) plt.colorbar(map=colormap) plt.show() ```
在這個(gè)示例中,我們使用了Faster R-CNN模型進(jìn)行對(duì)象檢測(cè)。首先,我們加載了預(yù)訓(xùn)練的模型和類別名稱,然后讀取了一張圖像并將其轉(zhuǎn)換為張量形式。接著,我們使用模型進(jìn)行對(duì)象檢測(cè),并繪制檢測(cè)結(jié)果。
5.未來發(fā)展趨勢(shì)與挑戰(zhàn)
深度學(xué)習(xí)在計(jì)算機(jī)視覺中的對(duì)象檢測(cè)技術(shù)已經(jīng)取得了顯著的進(jìn)展,但仍然存在一些挑戰(zhàn):
- 數(shù)據(jù)不足:對(duì)象檢測(cè)需要大量的標(biāo)注數(shù)據(jù),但標(biāo)注數(shù)據(jù)的收集和維護(hù)是一個(gè)耗時(shí)和費(fèi)力的過程。
- 實(shí)時(shí)性能:雖然現(xiàn)有的對(duì)象檢測(cè)方法在準(zhǔn)確性方面表現(xiàn)良好,但在實(shí)時(shí)性能方面仍然存在提升空間。
- 模型復(fù)雜度:深度學(xué)習(xí)模型通常具有高的參數(shù)復(fù)雜度,這導(dǎo)致了計(jì)算開銷和模型大小的問題。
- 泛化能力:深度學(xué)習(xí)模型在訓(xùn)練數(shù)據(jù)外部的泛化能力可能不佳,這可能導(dǎo)致在新的場(chǎng)景和任務(wù)中表現(xiàn)不佳。
未來的發(fā)展趨勢(shì)包括:
- 自監(jiān)督學(xué)習(xí):通過使用無標(biāo)注數(shù)據(jù)進(jìn)行預(yù)訓(xùn)練,從而減少對(duì)標(biāo)注數(shù)據(jù)的依賴。
- 零 shots對(duì)象檢測(cè):通過使用文本描述而不是圖像標(biāo)注來進(jìn)行對(duì)象檢測(cè),從而擴(kuò)展到新的類別。
- 模型壓縮:通過使用知識(shí)蒸餾、量化等技術(shù)來減小模型大小,從而提高實(shí)時(shí)性能。
- 多模態(tài)學(xué)習(xí):通過將計(jì)算機(jī)視覺與其他感知模態(tài)(如語音、觸摸等)相結(jié)合,從而提高對(duì)象檢測(cè)的準(zhǔn)確性和泛化能力。
6.結(jié)論
深度學(xué)習(xí)在計(jì)算機(jī)視覺中的對(duì)象檢測(cè)技術(shù)已經(jīng)取得了顯著的進(jìn)展,并且在實(shí)際應(yīng)用中得到了廣泛的使用。在本文中,我們介紹了深度學(xué)習(xí)對(duì)象檢測(cè)的主要技術(shù)和算法原理,并提供了一個(gè)具體的代碼示例。未來的發(fā)展趨勢(shì)和挑戰(zhàn)包括數(shù)據(jù)不足、實(shí)時(shí)性能、模型復(fù)雜度和泛化能力等。為了解決這些挑戰(zhàn),未來的研究方向包括自監(jiān)督學(xué)習(xí)、零 shots對(duì)象檢測(cè)、模型壓縮和多模態(tài)學(xué)習(xí)等。深度學(xué)習(xí)對(duì)象檢測(cè)技術(shù)的不斷發(fā)展和進(jìn)步將為計(jì)算機(jī)視覺和人工智能領(lǐng)域帶來更多的創(chuàng)新和應(yīng)用。
參考文獻(xiàn)
[1] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). You Only Look Once: Unified, Real-Time Object Detection with Deep Learning. In CVPR.
[2] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
[3] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[4] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2016). SSd: Single Shot MultiBox Detector. In ECCV.
[5] Szegedy, C., Liu, F., Jia, Y., Sermanet, P., Reed, S., Angeloni, E., & Erhan, D. (2015). Going Deeper with Convolutions. In CVPR.
[6] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. In ILSVRC.
[7] Long, J., Gan, H., and Shelhamer, E. (2015). Fully Convolutional Networks for Semantic Segmentation. In CVPR.
[8] Lin, T., Deng, J., Murdock, J., He, K., and Sun, J. (2014). Microsoft coco: Common objects in context. In arXiv:1405.0312.
[9] Everingham, M., Van Gool, L., Williams, C. K. I., and Winn, J. (2010). The Pascal VOC 2010 Classification and Localization Challenge. In IJCV.
[10] Uijlings, A., Sra, P., Gevers, T., and Van Gool, L. (2013). Image Annotation with Scribble-like Interaction. In CVPR.
[11] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Instances of Things: Detecting Objects and Their Attributes with Deep Neural Networks. In ECCV.
[12] Ren, S., He, K., Girshick, R., & Sun, J. (2017). A Faster R-CNN for Object Detection with a Region Proposal Network. In NIPS.
[13] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Yolo: Real-Time Object Detection with Deep Learning. In arXiv:1506.02640.
[14] Redmon, J., Farhadi, Y., & Zisserman, A. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[15] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2018). SSD: Single Shot MultiBox Detector. In arXiv:1612.08215.
[16] Lin, T., Deng, J., ImageNet: A Large-Scale Hierarchical Image Database. In CVPR.
[17] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). You Only Look Once: Unified, Real-Time Object Detection with Deep Learning. In CVPR.
[18] Redmon, J., Farhadi, Y., & Zisserman, A. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[19] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
[20] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[21] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2016). SSd: Single Shot MultiBox Detector. In ECCV.
[22] Szegedy, C., Liu, F., Jia, Y., Sermanet, P., Reed, S., Angeloni, E., & Erhan, D. (2015). Going Deeper with Convolutions. In CVPR.
[23] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. In ILSVRC.
[24] Long, J., Gan, H., and Shelhamer, E. (2015). Fully Convolutional Networks for Semantic Segmentation. In CVPR.
[25] Lin, T., Deng, J., Murdock, J., He, K., and Sun, J. (2014). Microsoft coco: Common objects in context. In arXiv:1405.0312.
[26] Everingham, M., Van Gool, L., Williams, C. K. I., and Winn, J. (2010). The Pascal VOC 2010 Classification and Localization Challenge. In IJCV.
[27] Uijlings, A., Sra, P., Gevers, T., and Van Gool, L. (2013). Image Annotation with Scribble-like Interaction. In CVPR.
[28] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Instances of Things: Detecting Objects and Their Attributes with Deep Neural Networks. In ECCV.
[29] Ren, S., He, K., Girshick, R., & Sun, J. (2017). A Faster R-CNN for Object Detection with a Region Proposal Network. In NIPS.
[30] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Yolo: Real-Time Object Detection with Deep Learning. In arXiv:1506.02640.
[31] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[32] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2018). SSD: Single Shot MultiBox Detector. In arXiv:1612.08215.
[33] Lin, T., Deng, J., ImageNet: A Large-Scale Hierarchical Image Database. In CVPR.
[34] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). You Only Look Once: Unified, Real-Time Object Detection with Deep Learning. In CVPR.
[35] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[36] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
[37] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[38] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2016). SSd: Single Shot MultiBox Detector. In ECCV.
[39] Szegedy, C., Liu, F., Jia, Y., Sermanet, P., Reed, S., Angeloni, E., & Erhan, D. (2015). Going Deeper with Convolutions. In CVPR.
[40] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. In ILSVRC.
[41] Long, J., Gan, H., and Shelhamer, E. (2015). Fully Convolutional Networks for Semantic Segmentation. In CVPR.
[42] Lin, T., Deng, J., Murdock, J., He, K., and Sun, J. (2014). Microsoft coco: Common objects in context. In arXiv:1405.0312.
[43] Everingham, M., Van Gool, L., Williams, C. K. I., and Winn, J. (2010). The Pascal VOC 2010 Classification and Localization Challenge. In IJCV.
[44] Uijlings, A., Sra, P., Gevers, T., and Van Gool, L. (2013). Image Annotation with Scribble-like Interaction. In CVPR.
[45] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Instances of Things: Detecting Objects and Their Attributes with Deep Neural Networks. In ECCV.
[46] Ren, S., He, K., Girshick, R., & Sun, J. (2017). A Faster R-CNN for Object Detection with a Region Proposal Network. In NIPS.
[47] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Yolo: Real-Time Object Detection with Deep Learning. In arXiv:1506.02640.
[48] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[49] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2018). SSD: Single Shot MultiBox Detector. In arXiv:1612.08215.
[50] Lin, T., Deng, J., ImageNet: A Large-Scale Hierarchical Image Database. In CVPR.
[51] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). You Only Look Once: Unified, Real-Time Object Detection with Deep Learning. In CVPR.
[52] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[53] Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. In NIPS.
[54] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[55] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2016). SSd: Single Shot MultiBox Detector. In ECCV.
[56] Szegedy, C., Liu, F., Jia, Y., Sermanet, P., Reed, S., Angeloni, E., & Erhan, D. (2015). Going Deeper with Convolutions. In CVPR.
[57] Simonyan, K., & Zisserman, A. (2014). Very Deep Convolutional Networks for Large-Scale Image Recognition. In ILSVRC.
[58] Long, J., Gan, H., and Shelhamer, E. (2015). Fully Convolutional Networks for Semantic Segmentation. In CVPR.
[59] Lin, T., Deng, J., Murdock, J., He, K., and Sun, J. (2014). Microsoft coco: Common objects in context. In arXiv:1405.0312.
[60] Everingham, M., Van Gool, L., Williams, C. K. I., and Winn, J. (2010). The Pascal VOC 2010 Classification and Localization Challenge. In IJCV.
[61] Uijlings, A., Sra, P., Gevers, T., and Van Gool, L. (2013). Image Annotation with Scribble-like Interaction. In CVPR.
[62] Redmon, J., Farhadi, Y., & Zisserman, A. (2016). Instances of Things: Detecting Objects and Their Attributes with Deep Neural Networks. In ECCV.
[63] Ren, S., He, K., Girshick, R., & Sun, J. (2017). A Faster R-CNN for Object Detection with a Region Proposal Network. In NIPS.
[64] Redmon, J., & Farhadi, Y. (2017). Yolo9000: Better, Faster, Stronger. In arXiv:1612.08215.
[65] Liu, A. D., Wang, M., Dollár, P., & Fei-Fei, L. (2016). SSd: Single Shot MultiBox Detector. In ECCV.文章來源:http://www.zghlxwxcb.cn/news/detail-827203.html
[66] Szegedy, C., Liu, F., Jia, Y., Sermanet, P., Reed, S., Angeloni, E., & Erhan, D. (2015). Going De文章來源地址http://www.zghlxwxcb.cn/news/detail-827203.html
到了這里,關(guān)于深度學(xué)習(xí)在計(jì)算機(jī)視覺中的對(duì)象檢測(cè)技術(shù)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!