前言
本文簡(jiǎn)要介紹單目(僅一個(gè)攝像頭)3D目標(biāo)檢測(cè)算法,并使用MMDetection3D算法庫(kù),對(duì)KITTI(SMOKE算法)、nuScenes-Mini(FCOS3D、PGD算法)進(jìn)行訓(xùn)練、測(cè)試以及可視化操作。
?
單目3D檢測(cè)
概述
單目3D檢測(cè),顧名思義,就是只使用一個(gè)攝像頭采集圖像數(shù)據(jù),并將圖像作為輸入送入模型進(jìn),為每一個(gè)感興趣的目標(biāo)預(yù)測(cè) 3D 框及類(lèi)別標(biāo)簽。但可以想到,圖像不能提供足夠的三維信息(缺失深度信息),因此人們?cè)谇靶┠隉嶂杂谘芯縇IDAR-based的算法(如: PointNet、VoxelNet、PointPillars等等),大大提高3D物體檢測(cè)的精度。
但是LIDAR-based的缺點(diǎn)在于成本較高,且易受天氣環(huán)境的影響,而Camera-based算法能夠提高檢測(cè)系統(tǒng)的魯棒性,尤其是當(dāng)其他更昂貴的模塊失效時(shí)。因此,如何基于單/多攝像頭數(shù)據(jù)實(shí)現(xiàn)可靠/精確的3D檢測(cè)顯得尤為重要。為了解決Camera-based中物體定位問(wèn)題,人們做了很多努力,例如從圖像中推斷深度、利用幾何約束和形狀先驗(yàn)等。然而,這個(gè)問(wèn)題遠(yuǎn)未解決。由于3D定位能力差,Camera-based算法檢測(cè)方法的性能仍然比LIDAR-based方法差得多。
?
檢測(cè)算法
下面這張圖出自這篇論文:3D Object Detection for Autonomous Driving: A Review and New Outlooks,論文非常詳細(xì)地列出了在2015年至2022年上半年之間,基于單目的3D目標(biāo)檢測(cè)研究工作,并且細(xì)分了Image-only Monocular 3D Object Detector
、Depth-assisted Monocular 3D Object Detector
、Prior-guided Monocular 3D Object Detector
、Stereo-based 3D Object Detector
和Multi-camera 3D Object Detector
五個(gè)小方向,個(gè)人認(rèn)為還是比較全面和客觀的
具體的論文可參照:Learning-Deep-Learning
另外還可以關(guān)注nuScenes官網(wǎng)中Detection
模塊的榜單,選擇Camera
模式,就可以看到實(shí)時(shí)更新的SOTA算法啦~
?
nuScenes-Mini數(shù)據(jù)集
官網(wǎng):https://www.nuscenes.org/
所屬公司:Motional(前身為 nuTonomy)
更多信息:https://www.nuscenes.org/nuscenes
簡(jiǎn)介
nuScenes 數(shù)據(jù)集(發(fā)音為 /nu?si?nz/)是由 Motional(前身為 nuTonomy)團(tuán)隊(duì)開(kāi)發(fā)的帶有 3d 對(duì)象注釋的大規(guī)模自動(dòng)駕駛數(shù)據(jù)集,它的特點(diǎn):
- Full sensor suite (1x LIDAR, 5x RADAR, 6x camera, IMU, GPS)
- 1000 scenes of 20s each
- 1,400,000 camera images
- 390,000 lidar sweeps
- Two diverse cities: Boston and Singapor
- Left versus right hand traffic
- Detailed map information
- 1.4M 3D bounding boxes manually annotated for 23 object classes
- Attributes such as visibility, activity and pose
- New: 1.1B lidar points manually annotated for 32 classes
- New: Explore nuScenes on Strada
- Free to use for non-commercial use
下載
完整的nuScenes數(shù)據(jù)集要500+G(要了命了),為了方便操作,節(jié)約時(shí)間,便于入門(mén)學(xué)習(xí)(理由可真多hhh),后續(xù)實(shí)戰(zhàn)我們選擇nuScenes-Mini數(shù)據(jù)集,官方是這么說(shuō)的:
Full dataset (v1.0)
In March 2019 we released the full nuScenes dataset with 1000 scenes. Due to the huge size of the dataset, we provide the mini, trainval and test splits separately. Mini (10 scenes) is a subset of trainval used to explore the data without having to download the entire dataset. Trainval (700+150 scenes) is packaged into 10 different archives that each contain 85 scenes. Test (150 scenes) is used for challenges and does not come with object annotations. Alternatively, it is also possible to download only selected modalities (camera, lidar, radar) or only keyframes. The meta data is provided separately and includes the annotations, ego vehicle poses, calibration, maps and log information.
下載方法一:手動(dòng)下載并解壓(不推薦)
首先在官網(wǎng)注冊(cè)賬號(hào),然后進(jìn)入nuScenes-Downloads頁(yè)面,滑到最下面,就可以看到Full dataset (v1.0)以及Mini數(shù)據(jù)集的下載地址啦~
下載方法二:命令行無(wú)腦下載(推薦)
nuScenes官網(wǎng)上有詳細(xì)的使用教程:Tutorial,為了后續(xù)實(shí)戰(zhàn)方便,這里選擇下載到mmdetection3d/data/nuscenes-mini/
目錄下:
# Make the directory to store the nuScenes dataset in.
!mkdir -p data/nuscenes-mini
cd data/nuscenes-mini
# Download the nuScenes mini split.
!wget https://www.nuscenes.org/data/v1.0-mini.tgz
# Uncompress the nuScenes mini split.
!tar -xf v1.0-mini.tgz -C data/nuscenes-mini
# Install nuScenes.
!pip install nuscenes-devkit &> /dev/null
安裝完成后,數(shù)據(jù)集結(jié)構(gòu)如下:
mmdetection3d
├── mmdet3d
├── tools
├── configs
├── data
│ ├── nuscenes-mini
│ │ ├── maps
│ │ ├── samples
│ │ ├── sweep
| | ├── v1.0-mini
?
MMDetection3D
版本:Release V1.1.0rc0
安裝配置、環(huán)境搭建:【MMDetection3D】環(huán)境搭建,使用PointPillers訓(xùn)練&測(cè)試&可視化KITTI數(shù)據(jù)集
下面,我們?cè)贛MDet3D庫(kù)中進(jìn)行如下實(shí)驗(yàn):
單目算法 | KITTI | nuScenes-Mini |
---|---|---|
PGD | √ | √ |
SMOKE | √ | |
FCOS3D | √ |
即:使用PGD算法和SMOKE算法對(duì)KITTI進(jìn)行訓(xùn)練、測(cè)試和可視化,使用PGD和FCOS3D算法對(duì)nuScenes-Mini進(jìn)行訓(xùn)練、測(cè)試和可視化。
?
數(shù)據(jù)集準(zhǔn)備
KITTI
3D 目標(biāo)檢測(cè) KITTI 數(shù)據(jù)集
nuScenes-Mini
3D 目標(biāo)檢測(cè) NUSCENES 數(shù)據(jù)集
?
配置文件
1、首先,要修改數(shù)據(jù)集路徑
以nuScenes-Mini數(shù)據(jù)集為例,在/mmdetection3d/configs/_base_/datasets/
文件夾中新建nus-mini-mono3d.py
文件,即用于單目檢測(cè)的nuScenes-Mini數(shù)據(jù)配置文件,將同文件夾下的nus-mono3d.py
文件中的內(nèi)容復(fù)制到新建文件中,并修改data_root
參數(shù):
dataset_type = 'NuScenesMonoDataset'
# 修改為你的數(shù)據(jù)集路徑
data_root = 'your_dataset_root'
class_names = [
'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
# Input modality for nuScenes dataset, this is consistent with the submission
# format which requires the information in input_modality.
input_modality = dict(
use_lidar=False,
use_camera=True,
use_radar=False,
use_map=False,
use_external=False)
img_norm_cfg = dict(
mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
scale_factor=1.0,
flip=False,
transforms=[
dict(type='RandomFlip3D'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]
# construct a pipeline for data and gt loading in show function
# please keep its loading function consistent with test_pipeline (e.g. client)
eval_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img'])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'nuscenes_infos_train_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=train_pipeline,
modality=input_modality,
test_mode=False,
box_type_3d='Camera'),
val=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'nuscenes_infos_val_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
box_type_3d='Camera'),
test=dict(
type=dataset_type,
data_root=data_root,
ann_file=data_root + 'nuscenes_infos_val_mono3d.coco.json',
img_prefix=data_root,
classes=class_names,
pipeline=test_pipeline,
modality=input_modality,
test_mode=True,
box_type_3d='Camera'))
evaluation = dict(interval=2)
2、修改模型配置文件(以PGD為例)
打開(kāi)/mmdetection3d/configs/pgd/
文件夾下,新建/pgd_r101_caffe_fpn_gn-head_2x16_1x_nus-mini-mono3d.py
文件,并將同文件夾下的/pgd_r101_caffe_fpn_gn-head_2x16_1x_nus-mono3d.py
文件內(nèi)容復(fù)制到新建文件中,并修改'../_base_/datasets/nus-mini-mono3d.py'
參數(shù):
_base_ = [
'../_base_/datasets/nus-mini-mono3d.py', '../_base_/models/pgd.py',
'../_base_/schedules/mmdet_schedule_1x.py', '../_base_/default_runtime.py'
]
# model settings
model = dict(
backbone=dict(
dcn=dict(type='DCNv2', deform_groups=1, fallback_on_stride=False),
stage_with_dcn=(False, False, True, True)),
bbox_head=dict(
pred_bbox2d=True,
group_reg_dims=(2, 1, 3, 1, 2,
4), # offset, depth, size, rot, velo, bbox2d
reg_branch=(
(256, ), # offset
(256, ), # depth
(256, ), # size
(256, ), # rot
(), # velo
(256, ) # bbox2d
),
loss_depth=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0),
bbox_coder=dict(
type='PGDBBoxCoder',
base_depths=((31.99, 21.12), (37.15, 24.63), (39.69, 23.97),
(40.91, 26.34), (34.16, 20.11), (22.35, 13.70),
(24.28, 16.05), (27.26, 15.50), (20.61, 13.68),
(22.74, 15.01)),
base_dims=((4.62, 1.73, 1.96), (6.93, 2.83, 2.51),
(12.56, 3.89, 2.94), (11.22, 3.50, 2.95),
(6.68, 3.21, 2.85), (6.68, 3.21, 2.85),
(2.11, 1.46, 0.78), (0.73, 1.77, 0.67),
(0.41, 1.08, 0.41), (0.50, 0.99, 2.52)),
code_size=9)),
# set weight 1.0 for base 7 dims (offset, depth, size, rot)
# 0.05 for 2-dim velocity and 0.2 for 4-dim 2D distance targets
train_cfg=dict(code_weight=[
1.0, 1.0, 0.2, 1.0, 1.0, 1.0, 1.0, 0.05, 0.05, 0.2, 0.2, 0.2, 0.2
]),
test_cfg=dict(nms_pre=1000, nms_thr=0.8, score_thr=0.01, max_per_img=200))
class_names = [
'car', 'truck', 'trailer', 'bus', 'construction_vehicle', 'bicycle',
'motorcycle', 'pedestrian', 'traffic_cone', 'barrier'
]
img_norm_cfg = dict(
mean=[103.530, 116.280, 123.675], std=[1.0, 1.0, 1.0], to_rgb=False)
train_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='LoadAnnotations3D',
with_bbox=True,
with_label=True,
with_attr_label=True,
with_bbox_3d=True,
with_label_3d=True,
with_bbox_depth=True),
dict(type='Resize', img_scale=(1600, 900), keep_ratio=True),
dict(type='RandomFlip3D', flip_ratio_bev_horizontal=0.5),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(type='DefaultFormatBundle3D', class_names=class_names),
dict(
type='Collect3D',
keys=[
'img', 'gt_bboxes', 'gt_labels', 'attr_labels', 'gt_bboxes_3d',
'gt_labels_3d', 'centers2d', 'depths'
]),
]
test_pipeline = [
dict(type='LoadImageFromFileMono3D'),
dict(
type='MultiScaleFlipAug',
scale_factor=1.0,
flip=False,
transforms=[
dict(type='RandomFlip3D'),
dict(type='Normalize', **img_norm_cfg),
dict(type='Pad', size_divisor=32),
dict(
type='DefaultFormatBundle3D',
class_names=class_names,
with_label=False),
dict(type='Collect3D', keys=['img']),
])
]
data = dict(
samples_per_gpu=2,
workers_per_gpu=2,
train=dict(pipeline=train_pipeline),
val=dict(pipeline=test_pipeline),
test=dict(pipeline=test_pipeline))
# optimizer
optimizer = dict(
lr=0.004, paramwise_cfg=dict(bias_lr_mult=2., bias_decay_mult=0.))
optimizer_config = dict(
_delete_=True, grad_clip=dict(max_norm=35, norm_type=2))
# learning policy
lr_config = dict(
policy='step',
warmup='linear',
warmup_iters=500,
warmup_ratio=1.0 / 3,
step=[8, 11])
total_epochs = 12
evaluation = dict(interval=4)
runner = dict(max_epochs=total_epochs)
3、修改訓(xùn)練Epoch數(shù)、保存權(quán)重間隔等等(可選)
- 在每個(gè)模型的配置文件中,如果是
EpochBasedRunner
的runner
,可以直接修改max_epochs
參數(shù)改變訓(xùn)練的epoch數(shù) - 在
/mmdetection3d/configs/_base_/default_runtime.py
文件中,修改第一行代碼的interval
參數(shù),即可改變保存權(quán)重間隔
?
多卡訓(xùn)練
這里以使用PGD模型訓(xùn)練nuScenes-Mini數(shù)據(jù)集為例,在Terminal中執(zhí)行如下命令,訓(xùn)練文件默認(rèn)保存至/mmdetection3d/work_dirs/pgd/pgd_r101_caffe_fpn_gn-head_2x16_1x_nus-mini-mono3d
文件夾中:
CUDA_VISIBLE_DEVICES=0,1,2,3 tools/dist_train.sh configs/pgd/pgd_r101_caffe_fpn_gn-head_2x16_1x_nus-mini-mono3d.py 4
?
測(cè)試及可視化
如果是在VScode中進(jìn)行測(cè)試和可視化,需要先設(shè)置$DISPLAY參數(shù):
首先在MobaXterm中輸入:echo $DISPLAY
,查看當(dāng)前窗口DISPLAY環(huán)境變量的值
(mmdet3d) xxx@xxx:~/det3d/mmdetection3d$ echo $DISPLAY
localhost:10.0
之后,在VScode的終端輸設(shè)置DISPLAY環(huán)境變量的值為10.0,并查看:
(mmdet3d) xxx@xxx:~/det3d/mmdetection3d$ export DISPLAY=:10.0
(mmdet3d) xxx@xxx:~/det3d/mmdetection3d$ echo $DISPLAY
:10.0
這里以PGD模型為例,在Terminal中執(zhí)行如下命令,測(cè)試文件默認(rèn)保存至/mmdetection3d/outputs/pgd_nus_mini
文件夾中:
python tools/test.py configs/pgd/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mini-mono3d_finetune.py work_dirs/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mini-mono3d_finetune/latest.pth --show --show-dir ./outputs/pgd/pgd_nus_mini
可視化結(jié)果如下:
KITTI
![]() |
![]() |
![]() |
![]() |
nuScenes-Mini
![]() |
![]() |
![]() |
![]() |
可以看到檢測(cè)出的多余框非常多,應(yīng)該是NMS閾值和score閾值設(shè)置問(wèn)題,下面我們修改閾值,以PGD-nuScenes為例,修改configs/pgd/pgd_r101_caffe_fpn_gn-head_2x16_2x_nus-mini-mono3d_finetune.py
文件中的測(cè)試參數(shù):
test_cfg=dict(nms_pre=1000, nms_thr=0.2, score_thr=0.1, max_per_img=200))
分別將NMS參數(shù)和score參數(shù)設(shè)置為:nms_thr=0.2, score_thr=0.1
再次進(jìn)行測(cè)試并可視化:
![]() |
![]() |
?
參考資料
3D Object Detection for Autonomous Driving: A Review and New Outlooks
nuScenes 數(shù)據(jù)集
MMDetection3D說(shuō)明文檔:基于視覺(jué)的 3D 檢測(cè)文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-779295.html
Questions about FCOS3D and PGD model 3D box文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-779295.html
到了這里,關(guān)于【MMDetection3D】基于單目(Monocular)的3D目標(biāo)檢測(cè)入門(mén)實(shí)戰(zhàn)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!