yacs
在正篇之前,有必要先了解一下yacs庫,因為SMOKE源碼的參數(shù)配置文件,都是基于yacs庫建立起來的,不學(xué)看不懂?。。。。?/p>
Introduction
yacs是一個用于定義和管理參數(shù)配置的庫(例如用于訓(xùn)練模型的超參數(shù)或可配置模型超參數(shù)等)。yacs使用yaml文件來配置參數(shù)。另外,yacs是在py-fast -rcnn和Detectron中使用的實驗配置系統(tǒng)中發(fā)展起來的
?
Usage
- 安裝
pip install yacs
- 創(chuàng)建
defaults.py
文件,然后導(dǎo)入包
from yacs.config import CfgNode as CN
- 創(chuàng)建
CN()
容器來裝載參數(shù),并添加需要的參數(shù)
from yacs.config import CfgNode as CN
__C = CN()
__C.name = 'test'
__C.model = CN() # 嵌套使用
__C.model.backbone = 'resnet'
__C.model.depth = 18
print(__C)
'''
name: test
model:
backbone: resnet
depth: 18
'''
- merge_from_file()
使用merge_from_file()
這個方法,會將默認(rèn)參數(shù)與特定參數(shù)不同的部分,用特定參數(shù)覆蓋
__C.merge_from_file("./test_config.yaml")
- 來自SMOKE官方源碼中的
defaults.py
示例(默認(rèn)參數(shù)):
import os
from yacs.config import CfgNode as CN
# -----------------------------------------------------------------------------
# Config definition
# -----------------------------------------------------------------------------
_C = CN()
_C.MODEL = CN()
_C.MODEL.SMOKE_ON = True
_C.MODEL.DEVICE = "cuda"
_C.MODEL.WEIGHT = ""
# -----------------------------------------------------------------------------
# INPUT
# -----------------------------------------------------------------------------
_C.INPUT = CN()
# Size of the smallest side of the image during training
_C.INPUT.HEIGHT_TRAIN = 384
# Maximum size of the side of the image during training
_C.INPUT.WIDTH_TRAIN = 1280
# Size of the smallest side of the image during testing
_C.INPUT.HEIGHT_TEST = 384
# Maximum size of the side of the image during testing
_C.INPUT.WIDTH_TEST = 1280
# Values to be used for image normalization
_C.INPUT.PIXEL_MEAN = [0.485, 0.456, 0.406] # kitti
# Values to be used for image normalization
_C.INPUT.PIXEL_STD = [0.229, 0.224, 0.225] # kitti
# Convert image to BGR format
_C.INPUT.TO_BGR = True
# Flip probability
_C.INPUT.FLIP_PROB_TRAIN = 0.5
# Shift and scale probability
_C.INPUT.SHIFT_SCALE_PROB_TRAIN = 0.3
_C.INPUT.SHIFT_SCALE_TRAIN = (0.2, 0.4)
# -----------------------------------------------------------------------------
# Dataset
# -----------------------------------------------------------------------------
_C.DATASETS = CN()
# List of the dataset names for training, as present in paths_catalog.py
_C.DATASETS.TRAIN = ()
# List of the dataset names for testing, as present in paths_catalog.py
_C.DATASETS.TEST = ()
# train split tor dataset
_C.DATASETS.TRAIN_SPLIT = ""
# test split for dataset
_C.DATASETS.TEST_SPLIT = ""
_C.DATASETS.DETECT_CLASSES = ("Car",)
_C.DATASETS.MAX_OBJECTS = 30
# -----------------------------------------------------------------------------
# DataLoader
# -----------------------------------------------------------------------------
_C.DATALOADER = CN()
# Number of data loading threads
_C.DATALOADER.NUM_WORKERS = 4
# If > 0, this enforces that each collated batch should have a size divisible
# by SIZE_DIVISIBILITY
_C.DATALOADER.SIZE_DIVISIBILITY = 0
# If True, each batch should contain only images for which the aspect ratio
# is compatible. This groups portrait images together, and landscape images
# are not batched with portrait images.
_C.DATALOADER.ASPECT_RATIO_GROUPING = False
# ---------------------------------------------------------------------------- #
# Backbone options
# ---------------------------------------------------------------------------- #
_C.MODEL.BACKBONE = CN()
# The backbone conv body to use
# The string must match a function that is imported in modeling.model_builder
_C.MODEL.BACKBONE.CONV_BODY = "DLA-34-DCN"
# Add StopGrad at a specified stage so the bottom layers are frozen
_C.MODEL.BACKBONE.FREEZE_CONV_BODY_AT = 0
# Normalization for backbone
_C.MODEL.BACKBONE.USE_NORMALIZATION = "GN"
_C.MODEL.BACKBONE.DOWN_RATIO = 4
_C.MODEL.BACKBONE.BACKBONE_OUT_CHANNELS = 64
# ---------------------------------------------------------------------------- #
# Group Norm options
# ---------------------------------------------------------------------------- #
_C.MODEL.GROUP_NORM = CN()
# Number of dimensions per group in GroupNorm (-1 if using NUM_GROUPS)
_C.MODEL.GROUP_NORM.DIM_PER_GP = -1
# Number of groups in GroupNorm (-1 if using DIM_PER_GP)
_C.MODEL.GROUP_NORM.NUM_GROUPS = 32
# GroupNorm's small constant in the denominator
_C.MODEL.GROUP_NORM.EPSILON = 1e-5
# ---------------------------------------------------------------------------- #
# Heatmap Head options
# ---------------------------------------------------------------------------- #
# --------------------------SMOKE Head--------------------------------
_C.MODEL.SMOKE_HEAD = CN()
_C.MODEL.SMOKE_HEAD.PREDICTOR = "SMOKEPredictor"
_C.MODEL.SMOKE_HEAD.LOSS_TYPE = ("FocalLoss", "DisL1")
_C.MODEL.SMOKE_HEAD.LOSS_ALPHA = 2
_C.MODEL.SMOKE_HEAD.LOSS_BETA = 4
# Channels for regression
_C.MODEL.SMOKE_HEAD.REGRESSION_HEADS = 8
# Specific channel for (depth_offset, keypoint_offset, dimension_offset, orientation)
_C.MODEL.SMOKE_HEAD.REGRESSION_CHANNEL = (1, 2, 3, 2)
_C.MODEL.SMOKE_HEAD.USE_NORMALIZATION = "GN"
_C.MODEL.SMOKE_HEAD.NUM_CHANNEL = 256
# Loss weight for hm and reg loss
_C.MODEL.SMOKE_HEAD.LOSS_WEIGHT = (1., 10.)
# Reference car size in (length, height, width)
# for (car, cyclist, pedestrian)
_C.MODEL.SMOKE_HEAD.DIMENSION_REFERENCE = ((3.88, 1.63, 1.53),
(1.78, 1.70, 0.58),
(0.88, 1.73, 0.67))
# Reference depth
_C.MODEL.SMOKE_HEAD.DEPTH_REFERENCE = (28.01, 16.32)
_C.MODEL.SMOKE_HEAD.USE_NMS = False
# ---------------------------------------------------------------------------- #
# Solver
# ---------------------------------------------------------------------------- #
_C.SOLVER = CN()
_C.SOLVER.OPTIMIZER = "Adam"
_C.SOLVER.MAX_ITERATION = 14500
_C.SOLVER.STEPS = (5850, 9350)
_C.SOLVER.BASE_LR = 0.00025
_C.SOLVER.BIAS_LR_FACTOR = 2
_C.SOLVER.LOAD_OPTIMIZER_SCHEDULER = True
_C.SOLVER.CHECKPOINT_PERIOD = 20
_C.SOLVER.EVALUATE_PERIOD = 20
# Number of images per batch
# This is global, so if we have 8 GPUs and IMS_PER_BATCH = 16, each GPU will
# see 2 images per batch
_C.SOLVER.IMS_PER_BATCH = 32
_C.SOLVER.MASTER_BATCH = -1
# ---------------------------------------------------------------------------- #
# Test
# ---------------------------------------------------------------------------- #
_C.TEST = CN()
# Number of images per batch
# This is global, so if we have 8 GPUs and IMS_PER_BATCH = 16, each GPU will
# see 2 images per batch
_C.TEST.SINGLE_GPU_TEST = True
_C.TEST.IMS_PER_BATCH = 1
_C.TEST.PRED_2D = True
# Number of detections per image
_C.TEST.DETECTIONS_PER_IMG = 50
_C.TEST.DETECTIONS_THRESHOLD = 0.25
# ---------------------------------------------------------------------------- #
# Misc options
# ---------------------------------------------------------------------------- #
# Directory where output files are written
_C.OUTPUT_DIR = "./output/exp"
# Set seed to negative to fully randomize everything.
# Set seed to positive to use a fixed seed. Note that a fixed seed does not
# guarantee fully deterministic behavior.
_C.SEED = -1
# Benchmark different cudnn algorithms.
# If input images have very different sizes, this option will have large overhead
# for about 10k iterations. It usually hurts total time, but can benefit for certain models.
# If input images have the same or similar sizes, benchmark is often helpful.
_C.CUDNN_BENCHMARK = True
_C.PATHS_CATALOG = os.path.join(os.path.dirname(__file__), "paths_catalog.py")
- 來自SMOKE官方源碼中的
smoke_gn_vector.yaml
示例(特定參數(shù)):
MODEL:
WEIGHT: "catalog://ImageNetPretrained/DLA34"
INPUT:
FLIP_PROB_TRAIN: 0.5
SHIFT_SCALE_PROB_TRAIN: 0.3
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_test",)
TRAIN_SPLIT: "trainval"
TEST_SPLIT: "test"
SOLVER:
BASE_LR: 2.5e-4
STEPS: (10000, 18000)
MAX_ITERATION: 25000
IMS_PER_BATCH: 32
?
SMOKE
Preface
Liu, Z C, Wu Z Z, Tóth R. Smoke: Single-stage monocular 3d object detection via keypoint estimation[C]. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020: 996-997.
Paper
Official Code
MMDetection3D Code
Abstract
SMOKE是一個One-Stage的單目3D檢測模型,它認(rèn)為2D檢測對于單目3D檢測任務(wù)來說是冗余的,且會引入噪聲影響3D檢測性能,所以直接用關(guān)鍵點預(yù)測和3D框回歸的方式,將每個物體與單個關(guān)鍵點配對,結(jié)合單個關(guān)鍵點估計和回歸的三維變量來預(yù)測每個被檢測物體的三維邊界框。
Contributions
- 消除2D檢測分支,估計投影在圖像平面上的3D關(guān)鍵點
- 為3D邊界盒回歸提供了一種多步驟解糾纏方法,分離3D包圍盒編碼階段和回歸損失函數(shù)中每個參數(shù)的貢獻(xiàn),有助于有效地訓(xùn)練整個網(wǎng)絡(luò)
Pipeline
輸入圖像經(jīng)過DLA-34網(wǎng)絡(luò)進(jìn)行特征提取,之后送入兩個檢測分支:關(guān)鍵點預(yù)測分支和3D邊界框回歸分支
- 關(guān)鍵點預(yù)測分支來定位前景目標(biāo),關(guān)鍵點分支輸出的分辨率為 H / 4 × W / 4 × C H/4 \times W/4\times C H/4×W/4×C , C C C表示數(shù)據(jù)集中前景目標(biāo)的類別個數(shù)
- 3D邊界框回歸分支輸出的分辨率為 H / 4 × W / 4 × 8 H/4 \times W/4\times 8 H/4×W/4×8,表示描述3D邊界框的參數(shù)有8個
Backbone
主干網(wǎng)絡(luò)采用帶有可變形卷積DCN(Deformable Convolution Network)以及GN(GroupNorm)標(biāo)準(zhǔn)化的DLA-34網(wǎng)絡(luò)(與CenterNet類似)提取特征,網(wǎng)絡(luò)輸出分辨率為輸入分辨率的四分之一。論文中采用DLA-34作為主干網(wǎng)絡(luò)進(jìn)行特征提取,以便對不同層之間的特征進(jìn)行聚合。網(wǎng)絡(luò)中主要做了兩點改動如下:
- 將所有的分層聚合連接替換為可變形卷積
- 將所有的BN層用GN(GroupNorm)替換,因為GN對batch size大小不敏感,且對訓(xùn)練噪聲更魯棒,作者在實驗部分也對這一點進(jìn)行了驗證
Head Branch
SMOKE的檢測網(wǎng)絡(luò)主要包括關(guān)鍵點檢測、3D邊界框回歸分支
- 在關(guān)鍵點分支中,圖像中的每一個目標(biāo)用一個關(guān)鍵點進(jìn)行表示。 這里的關(guān)鍵點被定義為目標(biāo)3D框的中心點在圖像平面上的投影點,而不是目標(biāo)的2D框中心點。如下圖所示,紅色點是目標(biāo)的2D框中心點,橙色點是3D框的中心點在圖像平面上的投影點

- 3D框回歸用于預(yù)測與構(gòu)建3D邊界框相關(guān)的信息,該信息可以表示為一個8元組:
τ = ( δ z , δ x c , δ y c , δ h , δ w , δ l , s i n α , c o s α ) T \tau = (\delta_z, \delta_{x_c},\delta_{y_c},\delta_h,\delta_w,\delta_l,sin\alpha,cos\alpha)^T τ=(δz?,δxc??,δyc??,δh?,δw?,δl?,sinα,cosα)T - 其中各參數(shù)含義如下:
- δ z \delta_z δz?:表示目標(biāo)的深度偏移量
- δ x c \delta_{x_c} δxc??:表示特征圖的關(guān)鍵點坐標(biāo)x方向的偏移量
- δ y c \delta_{y_c} δyc??:表示特征圖的關(guān)鍵點坐標(biāo)y方向的偏移量
- δ h , δ w , δ l \delta_h,\delta_w,\delta_l δh?,δw?,δl?:表示目標(biāo)體積值的殘差
- s i n α , c o s α sin\alpha,cos\alpha sinα,cosα:表示目標(biāo)旋轉(zhuǎn)角得向量化表示
- 由于網(wǎng)絡(luò)中進(jìn)行了特征圖下采樣,下采樣后的特征圖上的關(guān)鍵點坐標(biāo)基于預(yù)定義的關(guān)鍵點坐標(biāo)執(zhí)行離散化下采樣得到,但是這樣計算出來的關(guān)鍵點坐標(biāo)會存在誤差,因此論文中設(shè)置了兩個預(yù)測量 δ x c \delta_{x_c} δxc??和 δ y c \delta_{y_c} δyc??
Orientation
SMOKE里的方向預(yù)測算是比較麻煩的,詳細(xì)的推導(dǎo)可以參考這兩篇博客:
refer1
refer2
這里說一下我的理解:
- 在KITTI的相機(jī)坐標(biāo)系中的偏航角為 r y r_y ry?,觀測角為 α \alpha α,二者的關(guān)系為: r y = α + a r c t a n ( x / z ) r_y=\alpha+arctan(x/z) ry?=α+arctan(x/z),其中X軸正方向方向為0°,Z軸正方向為-90°
- SMOKE中自己定義了關(guān)于方向的坐標(biāo)系,以目標(biāo)前進(jìn)方向為X軸,左側(cè)為Z軸建立坐標(biāo)系,方向為【從目標(biāo)和相機(jī)連接線,向Z軸或者X軸為正方向】
- SMOKE中定義的偏航角為:
θ = α z + arctan ? ( x z ) \theta=\alpha_z +\arctan(\frac{x}{z}) θ=αz?+arctan(zx?) - 其中,該公式與KITTI中的角度對應(yīng)關(guān)系為
θ ? r y α z ? α \begin{aligned} \theta & \Leftrightarrow r_y \\ \alpha_z & \Leftrightarrow \alpha \end{aligned} θαz???ry??α? - 具體的推導(dǎo)可以參考下圖:

Loss
SMOKE的損失函數(shù),包括關(guān)鍵點分類損失函數(shù)+3D邊界框回歸損失函數(shù)
- 關(guān)鍵點分類損失函數(shù) L c l s L_\mathrm{cls} Lcls?借鑒了CornerNet與CenterNet中的帶懲罰因子的Focal Loss,引入了高斯核對關(guān)鍵點真值附近的點也分配了監(jiān)督信號進(jìn)行約束
- 3D邊界框回歸損失函數(shù)
L
r
e
g
L_\mathrm{reg}
Lreg?借鑒了“Disentangling Monocular 3D Object Detection”中所提出的解耦訓(xùn)練的方式,回歸的對象是3D邊界框的
(
δ
z
,
δ
x
c
,
δ
y
c
,
δ
h
,
δ
w
,
δ
l
,
s
i
n
α
,
c
o
s
α
)
(\delta_z, \quad \delta_{x_c},\quad \delta_{y_c},\quad \delta_h\quad,\delta_w\quad,\delta_l\quad,sin\alpha\quad, cos\alpha)
(δz?,δxc??,δyc??,δh?,δw?,δl?,sinα,cosα)八個參數(shù),損失函數(shù)使用L1 Loss,3D邊界框回歸損失定義為:
L r e g = λ N ∥ B ^ ? B ∥ 1 L_{\mathrm{reg}}=\frac{\lambda}{N}\|\hat{B}-B\|_1 Lreg?=Nλ?∥B^?B∥1? - 其中 B ^ \hat{B} B^為預(yù)測值, B B B為真實值, λ N \frac{\lambda}{N} Nλ?系數(shù)是用作調(diào)節(jié)回歸損失和關(guān)鍵點分類損失的占比的
- 總的損失函數(shù)為:
L = L c l s + ∑ i = 1 3 L r e g ( B ^ i ) L=L_{\mathrm{cls}}+\sum_{i=1}^3 L_{\mathrm{reg}}(\hat{B}_i) L=Lcls?+i=1∑3?Lreg?(B^i?)
Run Code
SMOKE算法的源碼主要有兩個版本:
- 作者官方維護(hù)的源碼:https://github.com/lzccccc/SMOKE
- OpenMMLab復(fù)現(xiàn)的MMDetection3D版本:https://github.com/open-mmlab/mmdetection3d
根據(jù)本人實際使用的情況看,直接上手MMDetection3D版本就行(確實好用),官方版本目前只能實現(xiàn)訓(xùn)練和簡單測試(還要額外添加其他庫),很多功能還不完善,有興趣的小伙伴可以嘗試學(xué)習(xí)一下,就當(dāng)做鍛煉自己看代碼的能力了
MMDetection3D版本(推薦)
https://github.com/open-mmlab/mmdetection3d
1、創(chuàng)建環(huán)境
# 在Anaconda中新建虛擬環(huán)境
conda create -n mmdet3d python=3.7 -y
conda activate mmdet3d
# 安裝最新的PyTorch版本
conda install -c pytorch pytorch torchvision -y
# install mmcv
pip install mmcv-full
# install mmdetection
pip install git+https://github.com/open-mmlab/mmdetection.git
# install mmsegmentation
pip install git+https://github.com/open-mmlab/mmsegmentation.git
# install mmdetection3d
git clone https://github.com/open-mmlab/mmdetection3d.git
cd mmdetection3d
pip install -v -e . # or "python setup.py develop"
# -v:verbose, or more output
# -e:editable,修改本地文件,調(diào)用的模塊以最新文件為準(zhǔn)
2、kitti數(shù)據(jù)集準(zhǔn)備
參考官方教程:3D 目標(biāo)檢測 KITTI 數(shù)據(jù)集
3、修改參數(shù)
- 數(shù)據(jù)集路徑:打開
/mmdetection3d/configs/_base_/datasets/kitti-mono3d.py
文件,修改data_root = '/your_datasets_root'
- 訓(xùn)練參數(shù):打開
/mmdetection3d/configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py
文件,按需修改參數(shù)(例如修改max_epochs、保存權(quán)重的間隔數(shù)等等)
4、訓(xùn)練
配置好環(huán)境、數(shù)據(jù)集、參數(shù)之后,就可以直接進(jìn)行訓(xùn)練(以多卡訓(xùn)練為例):
CUDA_VISIBLE_DEVICES=0,1,2,3 tools/dist_train.sh configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py 4
這里沒有指定保存路徑,默認(rèn)保存至/mmdetection3d/work_dirs/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d/
文件夾中
6、測試及可視化
直接在命令行輸入以下命令即可:
- [必選參數(shù)] config:配置文件
- [必選參數(shù)] checkpoint:訓(xùn)練生成的權(quán)重文件
- show:可視化
- show-dir:指定可視化結(jié)果生成的路徑
python tools/test.py configs/smoke/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d.py work_dirs/smoke_dla34_pytorch_dlaneck_gn-all_8x4_6x_kitti-mono3d/latest.pth --show --show-dir ./outputs/smoke/smoke_kitti_72e
結(jié)果如下所示:
6、友情提示
目前對于SMOKE算法來說,是不可以通過改變score_thr
參數(shù),來調(diào)節(jié)可視化輸出的3D框數(shù)量,原因是SMOKE的檢測頭SMOKEMono3D
繼承自SingleStageMono3DDetector
:
而在SingleStageMono3DDetector
類中,還未實現(xiàn)score_thr
參數(shù)的調(diào)節(jié)功能(這個bug讓我一頓好找o(╥﹏╥)o)
?
官方版本(可選)
https://github.com/lzccccc/SMOKE
1、創(chuàng)建環(huán)境
conda create -n smoke python=3.7 -y
conda activate smoke
pip install torch==1.4.0 torchvision==0.5.0
git clone https://github.com/lzccccc/SMOKE
cd smoke
python setup.py build develop
2、添加安裝庫文件:在smoke
主目錄下,新建requirements.txt
文件,并寫入以下安裝包信息:
shapely
tqdm
tensorboard
tensorboardX
scikit-image
matplotlib
yacs
pyyaml
fire
pycocotools
fvcore
opencv-python
numba
inplace_abn
之后在命令行執(zhí)行pip install -r requirements.txt
進(jìn)行安裝
3、KITTI數(shù)據(jù)集下載及配置
具體下載步驟可參考這篇博客:【MMDetection3D】環(huán)境搭建,使用PointPillers訓(xùn)練&測試&可視化KITTI數(shù)據(jù)集,下載完成后,將數(shù)據(jù)集按照以下結(jié)構(gòu)進(jìn)行組織:
kitti
│──training
│ ├──calib
│ ├──label_2
│ ├──image_2
│ └──ImageSets
└──testing
├──calib
├──image_2
└──ImageSets
4、修改數(shù)據(jù)集路徑
方式一:軟連接下載好的kitti數(shù)據(jù)集到datasets文件夾中,之后就不用管啦,默認(rèn)路徑就是datasets/kitti/
,但是這種方式在之后的測試階段會出現(xiàn)找不到文件的情況
mkdir datasets
ln -s /path_to_kitti_dataset datasets/kitti
方式二(推薦):打開/smoke/smoke/config/paths_catalog.py
,直接修改數(shù)據(jù)集路徑
class DatasetCatalog():
DATA_DIR = "your_datasets_root/"
DATASETS = {
"kitti_train": {
"root": "kitti/training/",
},
"kitti_test": {
"root": "kitti/testing/",
},
}
5、修改訓(xùn)練設(shè)置(可選)
打開/smoke/configs/smoke_gn_vector.yaml
文件,可以修改一些訓(xùn)練參數(shù),比如訓(xùn)練迭代次數(shù)、batchsize等:
# 模型設(shè)置
MODEL:
WEIGHT: "catalog://ImageNetPretrained/DLA34"
# 數(shù)據(jù)集設(shè)置
INPUT:
FLIP_PROB_TRAIN: 0.5
SHIFT_SCALE_PROB_TRAIN: 0.3
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_test",)
TRAIN_SPLIT: "trainval"
TEST_SPLIT: "test"
# 訓(xùn)練參數(shù)設(shè)置
SOLVER:
BASE_LR: 2.5e-4
STEPS: (10000, 15000)
MAX_ITERATION: 20000 # 迭代次數(shù)
IMS_PER_BATCH: 8 # 所有GPU的batch_size
6、全部參數(shù)設(shè)置
打開/smoke/smoke/config/defaults.py
文件,可以修改全部配置參數(shù),包括數(shù)據(jù)集輸入、處理、模型結(jié)構(gòu)、訓(xùn)練、測試等參數(shù)。這個文件最好不要動,如果要修改參數(shù),就去上一步的smoke_gn_vector.yaml
文件中進(jìn)行修改。比如要修改訓(xùn)練、測試結(jié)果保存的路徑,可以在最后直接加入:
# 模型設(shè)置
MODEL:
WEIGHT: "catalog://ImageNetPretrained/DLA34"
# 數(shù)據(jù)集設(shè)置
INPUT:
FLIP_PROB_TRAIN: 0.5
SHIFT_SCALE_PROB_TRAIN: 0.3
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_test",)
TRAIN_SPLIT: "trainval"
TEST_SPLIT: "test"
# 訓(xùn)練參數(shù)設(shè)置
SOLVER:
BASE_LR: 2.5e-4
STEPS: (10000, 15000)
MAX_ITERATION: 20000 # 迭代次數(shù)
IMS_PER_BATCH: 8 # 所有GPU的batch_size
# 輸出保存路徑
OUTPUT_DIR: "./output/exp"
7、開始訓(xùn)練
- 單GPU訓(xùn)練:
python tools/plain_train_net.py --config-file "configs/smoke_gn_vector.yaml"
- 多GPU訓(xùn)練:
python tools/plain_train_net.py --num-gpus 4 --config-file "configs/smoke_gn_vector.yaml"
- 第一次訓(xùn)練,會自動下載預(yù)訓(xùn)練權(quán)重dla34-ba72cf86.pth,因為要翻墻,所以下載很慢,大家可以從這里直接下載到本地,然后上傳到
/root/.torch/models/dla34-ba72cf86.pth
即可
8、測試
SMOKE官方源碼在測試時會有很多問題,作者在這篇issue中給出了解決方案:
You need to put offline kitti eval code under the folder “/smoke/data/datasets/evaluation/kitti/kitti_eval”
if you are using the train/val split. It will compile it automatically and evaluate the performance.
The eval code can be found here:
https://github.com/prclibo/kitti_eval (for 11 recall points)
https://github.com/lzccccc/kitti_eval_offline (for 40 recall points)
?
However, if you are using the trainval (namely the whole training set), there is no need to evaluate it offline. You need to log in to the kitti webset and submit your result.
具體的測試步驟如下:
- 下載kitti_eval到
/smoke/smoke/data/datasets/evaluation/kitti/
文件夾中 - 修改測試集設(shè)置:打開
/smoke/configs/smoke_gn_vector.yaml
文件,將DATASETS
部分修改為:
DATASETS:
DETECT_CLASSES: ("Car", "Cyclist", "Pedestrian")
TRAIN: ("kitti_train",)
TEST: ("kitti_train",)
TRAIN_SPLIT: "train"
TEST_SPLIT: "val"
- 修改
/smoke/smoke/data/datasets/evaluation/kitti/kitti_eval.py
文件中的do_kitti_detection_evaluation
函數(shù):
def do_kitti_detection_evaluation(dataset,
predictions,
output_folder,
logger
):
predict_folder = os.path.join(output_folder, 'data') # only recognize data
mkdir(predict_folder)
for image_id, prediction in predictions.items():
predict_txt = image_id + '.txt'
predict_txt = os.path.join(predict_folder, predict_txt)
generate_kitti_3d_detection(prediction, predict_txt)
logger.info("Evaluate on KITTI dataset")
output_dir = os.path.abspath(output_folder)
os.chdir('./smoke/data/datasets/evaluation/kitti/kitti_eval')
# os.chdir('../smoke/data/datasets/evaluation/kitti/kitti_eval')
label_dir = getattr(dataset, 'label_dir')
if not os.path.isfile('evaluate_object_3d_offline'):
subprocess.Popen('g++ -O3 -DNDEBUG -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp', shell=True)
command = "./evaluate_object_3d_offline {} {}".format(label_dir, output_dir)
output = subprocess.check_output(command, shell=True, universal_newlines=True).strip()
logger.info(output)
os.chdir('./')
# os.chdir('../')
- 開始測試,目前只支持單GPU測試,并且只得到txt形式的預(yù)測結(jié)果,沒有可視化操作(后續(xù)我會嘗試加入可視化功能)
- 其中ckpt參數(shù)為訓(xùn)練得到的最后模型權(quán)重
python tools/plain_train_net.py --eval-only --ckpt YOUR_CKPT --config-file "configs/smoke_gn_vector.yaml"
這里測試的邏輯是:
- 首先加載數(shù)據(jù)集(kitti_train),送入訓(xùn)練好的模型進(jìn)行預(yù)測,得到預(yù)測結(jié)果(output)
- 然后進(jìn)入
kitti_eval
文件夾中,執(zhí)行g++ -O3 -DNDEBUG -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp
,編譯生成evaluate_object_3d_offline
文件 - 最后在
kitti_eval
文件夾中,執(zhí)行./evaluate_object_3d_offline /your_root_dir/kitti/training/label_2/ /your_root_dir/smoke/output/exp4/inference/kitti_train
,進(jìn)行指標(biāo)計算
注意?。y試這一步坑很多:
- 如果出現(xiàn)以下報錯:定位到報錯的函數(shù)
subprocess
,第412
行(不同版本位置可能不同),將check
改為False
即可
subprocess.CalledProcessError: Command './evaluate_object_3d_offline datasets/kitti/training/label_2 /home/rrl/det3d/smoke/output/exp4/inference/kitti_train' returned non-zero exit status 127.

- 如果出現(xiàn)類似下面的報錯,一定要檢查訓(xùn)練集的
label_2
文件夾的路徑,推薦使用絕對路徑,而不是軟連接(我一開始用的軟連接,一直報這個錯o(╥﹏╥)o)
Thank you for participating in our evaluation!
Loading detections...
number of files for evaluation: 3769
ERROR: Couldn't read: 006071.txt of ground truth. Please write me an email!
An error occured while processing your results.
最終生成的測試文件目錄為:
9、可視化預(yù)測結(jié)果
Coming soon…
?
?
Reference
yacs的使用小記
https://github.com/lzccccc/SMOKE/issues/4
[CVPRW 2020] SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation 論文閱讀文章來源:http://www.zghlxwxcb.cn/news/detail-709364.html
Apollo 7.0障礙物感知模型原型!SMOKE 單目3D目標(biāo)檢測,代碼開源!文章來源地址http://www.zghlxwxcb.cn/news/detail-709364.html
到了這里,關(guān)于【單目3D目標(biāo)檢測】SMOKE論文解析與代碼復(fù)現(xiàn)的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!