mmdetection3d可視化多模態(tài)模型推理結(jié)果

這篇具有很好參考價值的文章主要介紹了mmdetection3d可視化多模態(tài)模型推理結(jié)果。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

本篇博文講一下mmdetection3d可視化

參考文獻:

帶你玩轉(zhuǎn) 3D 檢測和分割（三）：有趣的可視化 - 知乎 (zhihu.com)

Welcome to MMDetection3D’s documentation! — MMDetection3D 1.0.0rc4 文檔

1、介紹

讓我們看一下ChatGPT的回答[手動狗頭]:

mmdetection3D是基于PyTorch框架的3D目標檢測工具包，它是mmdetection的3D擴展版本。它提供了一個靈活且高效的框架，可以訓(xùn)練和測試各種3D目標檢測算法，包括點云、體積數(shù)據(jù)和混合數(shù)據(jù)。

mmdetection3D的主要特點包括：

靈活的模塊化設(shè)計：模塊化設(shè)計使得用戶可以輕松地構(gòu)建自己的檢測算法，并且可以很方便地替換各種組件，如骨干網(wǎng)絡(luò)、頭部網(wǎng)絡(luò)和損失函數(shù)等。

高效的訓(xùn)練和推斷：mmdetection3D支持分布式訓(xùn)練，可以在多GPU和多機器上進行訓(xùn)練。此外，它還支持多線程數(shù)據(jù)加載和多進程測試，從而提高了訓(xùn)練和推斷的效率。

多樣化的數(shù)據(jù)格式支持：mmdetection3D支持多種3D數(shù)據(jù)格式，包括點云、體積數(shù)據(jù)和混合數(shù)據(jù)。同時，它還支持多種數(shù)據(jù)增強方法，從而可以生成更多的訓(xùn)練數(shù)據(jù)，提高模型的魯棒性。

容易使用的API：mmdetection3D提供了易于使用的API，用戶可以輕松地使用它進行模型訓(xùn)練、測試和部署。

總之，mmdetection3D是一個強大且易于使用的3D目標檢測工具包，它可以幫助研究人員和工程師快速地開發(fā)和部署各種3D目標檢測算法。

mmdetection3d也提供了眾多預(yù)訓(xùn)練模型，可以讓我們很容易的了解一下3D目標檢測的整個流程

2、需求

本次需求較為簡單，主要是使用預(yù)訓(xùn)練模型，完成推理，并將結(jié)果繪制到圖像中，便于可視化

結(jié)合實際，我需要得到的結(jié)果精度盡可能高，所以我選擇了MVXNet這一多模態(tài)融合的檢測方案

3、API分析

mmdetection3d提供了大量封裝好的API，我們可以直接拿來用。本次使用到的API如下所示：

init_model: 通過配置文件、chekpoint_file（可選）構(gòu)建一個模型
build_datasets：通過配置文件，構(gòu)建數(shù)據(jù)集，得到包含所有數(shù)據(jù)的列表
show_multi_modality_result：將3D bbox投影到圖像，并保存
inference_multi_modality_detector：構(gòu)建多模態(tài)檢測器，稍后會詳細解釋各個參數(shù)。如果不選用多模態(tài)方案，此處可以選擇其他的API，有時間會寫一下

4、實戰(zhàn)

4.1 inference_multi_modality_detector解讀

我們先來看一下最關(guān)鍵的inference_multi_modality_detector，它位于mmdet3d/apis/inference.py中，代碼我就不貼了，我們看一下它的參數(shù)：

Args:
    model (nn.Module): The loaded detector.
    pcd (str): Point cloud files.
    image (str): Image files.
    ann_file (str): Annotation files.
Returns:
    tuple: Predicted results and data from pipeline.
"""

它接收一個模型、點云文件、圖像文件、還有參數(shù)文件。模型、點云、圖像一眼明白，這個ann_file大家可能會覺得一頭霧水，demo/data/kitti下給我們了一個樣例注釋文件，我們使用下面代碼看一下這個參數(shù)文件里有什么

import pickle
with open('demo/data/kitti/kitti_000008_infos.pkl', 'rb') as f:
    data = pickle.load(f)
    print(data)

輸出如下，是一個長度為1的列表，列表項為字典，里面記錄了圖像、點云路徑，還有一些內(nèi)參外參，投影矩陣巴拉巴拉巴拉:

[{'image': {'image_idx': 8, 'image_path': 'training/image_2/000008.png', 'image_shape': array([ 375, 1242])}, 'point_cloud': {'num_features': 4, 'velodyne_path': 'training/velodyne/000008.bin'}, 'calib': {'P0': array([[721.5377,   0.    , 609.5593,   0.    ],
       [  0.    , 721.5377, 172.854 ,   0.    ],
       [  0.    ,   0.    ,   1.    ,   0.    ],
       [  0.    ,   0.    ,   0.    ,   1.    ]]), 'P1': array([[ 721.5377,    0.    ,  609.5593, -387.5744],
       [   0.    ,  721.5377,  172.854 ,    0.    ],
       [   0.    ,    0.    ,    1.    ,    0.    ],
       [   0.    ,    0.    ,    0.    ,    1.    ]]), 'P2': array([[7.215377e+02, 0.000000e+00, 6.095593e+02, 4.485728e+01],
       [0.000000e+00, 7.215377e+02, 1.728540e+02, 2.163791e-01],
       [0.000000e+00, 0.000000e+00, 1.000000e+00, 2.745884e-03],
       [0.000000e+00, 0.000000e+00, 0.000000e+00, 1.000000e+00]]), 'P3': array([[ 7.215377e+02,  0.000000e+00,  6.095593e+02, -3.395242e+02],
       [ 0.000000e+00,  7.215377e+02,  1.728540e+02,  2.199936e+00],
       [ 0.000000e+00,  0.000000e+00,  1.000000e+00,  2.729905e-03],
       [ 0.000000e+00,  0.000000e+00,  0.000000e+00,  1.000000e+00]]), 'R0_rect': array([[ 0.9999239 ,  0.00983776, -0.00744505,  0.        ],
       [-0.0098698 ,  0.9999421 , -0.00427846,  0.        ],
       [ 0.00740253,  0.00435161,  0.9999631 ,  0.        ],
       [ 0.        ,  0.        ,  0.        ,  1.        ]]), 'Tr_velo_to_cam': array([[ 7.533745e-03, -9.999714e-01, -6.166020e-04, -4.069766e-03],
       [ 1.480249e-02,  7.280733e-04, -9.998902e-01, -7.631618e-02],
       [ 9.998621e-01,  7.523790e-03,  1.480755e-02, -2.717806e-01],
       [ 0.000000e+00,  0.000000e+00,  0.000000e+00,  1.000000e+00]]), 'Tr_imu_to_velo': array([[ 9.999976e-01,  7.553071e-04, -2.035826e-03, -8.086759e-01],
       [-7.854027e-04,  9.998898e-01, -1.482298e-02,  3.195559e-01],
       [ 2.024406e-03,  1.482454e-02,  9.998881e-01, -7.997231e-01],
       [ 0.000000e+00,  0.000000e+00,  0.000000e+00,  1.000000e+00]])}, 'annos': {'name': array(['Car', 'Car', 'Car', 'Car', 'Car', 'Car', 'DontCare', 'DontCare',
       'DontCare', 'DontCare'], dtype='<U8'), 'truncated': array([ 0.88,  0.  ,  0.34,  0.  ,  0.  ,  0.  , -1.  , -1.  , -1.  ,
       -1.  ]), 'occluded': array([ 3,  1,  3,  1,  0,  0, -1, -1, -1, -1], dtype=int64), 'alpha': array([ -0.69,   2.04,  -1.84,  -1.33,   1.74,  -1.65, -10.  , -10.  ,
       -10.  , -10.  ]), 'bbox': array([[   0.  ,  192.37,  402.31,  374.  ],
       [ 334.85,  178.94,  624.5 ,  372.04],
       [ 937.29,  197.39, 1241.  ,  374.  ],
       [ 597.59,  176.18,  720.9 ,  261.14],
       [ 741.18,  168.83,  792.25,  208.43],
       [ 884.52,  178.31,  956.41,  240.18],
       [ 800.38,  163.67,  825.45,  184.07],
       [ 859.58,  172.34,  886.26,  194.51],
       [ 801.81,  163.96,  825.2 ,  183.59],
       [ 826.87,  162.28,  845.84,  178.86]]), 'dimensions': array([[ 3.23,  1.6 ,  1.57],
       [ 3.68,  1.57,  1.5 ],
       [ 3.08,  1.39,  1.44],
       [ 3.66,  1.47,  1.6 ],
       [ 4.08,  1.7 ,  1.63],
       [ 2.47,  1.59,  1.59],
       [-1.  , -1.  , -1.  ],
       [-1.  , -1.  , -1.  ],
       [-1.  , -1.  , -1.  ],
       [-1.  , -1.  , -1.  ]]), 'location': array([[   -2.7 ,     1.74,     3.68],
       [   -1.17,     1.65,     7.86],
       [    3.81,     1.64,     6.15],
       [    1.07,     1.55,    14.44],
       [    7.24,     1.55,    33.2 ],
       [    8.48,     1.75,    19.96],
       [-1000.  , -1000.  , -1000.  ],
       [-1000.  , -1000.  , -1000.  ],
       [-1000.  , -1000.  , -1000.  ],
       [-1000.  , -1000.  , -1000.  ]]), 'rotation_y': array([ -1.29,   1.9 ,  -1.31,  -1.25,   1.95,  -1.25, -10.  , -10.  ,
       -10.  , -10.  ]), 'score': array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]), 'index': array([ 0,  1,  2,  3,  4,  5, -1, -1, -1, -1]), 'group_ids': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), 'difficulty': array([-1,  1, -1,  1,  1,  0, -1, -1, -1, -1]), 'num_points_in_gt': array([1325, 1900,  881,  659,   55,  162,   -1,   -1,   -1,   -1])}}]

4.2 獲取ann_file

要想使用inference_multi_modality_detector進行推理，就必須獲取這個關(guān)鍵的ann_file，我們使用mmdetection3d中搭建完環(huán)境后，就必須使用其提供的tools組織數(shù)據(jù)集，最后會生成若干pkl文件（適用于點云）和json（適用于圖像）。我們基于KITTI數(shù)據(jù)集，獲取組織好的pkl文件，這里使用訓(xùn)練集kitti_infos_train.pkl，用上面查看pkl文件的代碼看一下，大同小異，只不過是列表的長度變長了，為了便于inference_multi_modality_detector使用，我們把列表拆分，逐個存儲，下面代碼說明如何獲取每個點云/圖像對應(yīng)的ann_file：

import pickle

# 讀取KITTI數(shù)據(jù)集訓(xùn)練組織文件
with open('pkl/kitti_infos_train.pkl', 'rb') as f:
    data = pickle.load(f)
print("訓(xùn)練集長度:", len(data))

use_num = 50  # 取訓(xùn)練集的前50條數(shù)據(jù)（根據(jù)自己需求定）

print(data[0]['image']['image_path'].split('/')[-1].split('.')[0]) # 拆分出文件名來
for i in range(0, use_num, 1):
    list = []
    list.append(data[i])
    cur_data = list # 將當前注釋存儲到列表
    print(cur_data)
    file_name = data[i]['image']['image_path'].split('/')[-1].split('.')[0]
    save_path = 'kitti_pkl_output/'
    # 保存文件
    with open(save_path + 'kitti_' + file_name + '_infos.pkl', "wb") as f:
        pickle.dump(cur_data, f)

最終得到一系列pkl文件：

mmdetect3d 可視化,3d,深度學習,人工智能

4.3 構(gòu)建模型并完成推理

構(gòu)建模型

from mmdet3d.apis import init_model
# 構(gòu)建預(yù)訓(xùn)練模型
config_file = '/home/wistful/work/mmdetection3d/configs/my_config/my_dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class.py'
chekpoint_file = '/home/wistful/ResultDir/my_pth/mxvnet/dv_mvx-fpn_second_secfpn_adamw_2x8_80e_kitti-3d-3class_20210831_060805-83442923.pth'
model = init_model(config_file, chekpoint_file, device='cuda:0')

構(gòu)建數(shù)據(jù)集
```
# 構(gòu)建數(shù)據(jù)集，此處選用KITTI多模態(tài)數(shù)據(jù)，便于可視化
from mmdet3d.datasets import build_dataset
from mmcv import Config
import os

os.chdir('/home/wistful/work/mmdetection3d/')
config_file = 'configs/_base_/datasets/kitti-3d-3class-multi.py'
cfg = Config.fromfile(config_file)

datasets = [build_dataset(cfg.data.train)]
```
為什么要使用配置文件這樣構(gòu)建數(shù)據(jù)集，因為4.1中也提到了，inference_multi_modality_detector需要同時接收點云和圖像文件作為輸入，使用上述方案構(gòu)建數(shù)據(jù)集，可以很方便的獲取，每一項數(shù)據(jù)都包含了圖像信息、點云、圖像、地面真相等信息，我們只需要遍歷就可以了

推理并保存可視化圖像

# 遍歷n條數(shù)據(jù)用于可視化
from mmdet3d.core.visualizer import show_multi_modality_result
from mmdet3d.apis import inference_multi_modality_detector

out_dir = "/home/wistful/work/mmdetection3d/visual_img/kitti/"
pkl_dir = '/home/wistful/work/mmdetection3d/data/kitti/kitti_pkl_output/'

num = 50
for i in range(0, num, 1):
    cur_data = datasets[0][i] # 遍歷數(shù)據(jù)集
    img_metas = cur_data.get('img_metas').data  # 獲取圖像原始信息
    pts_file = img_metas.get('pts_filename')  # 獲取點云
    img = cur_data.get('img').data # 獲取圖像
    img_file_path = img_metas.get('filename') # 獲取圖像文件名
    name = img_file_path.split('/')[-1].split('.')[0] # 分離名稱
    ann_file = pkl_dir + 'kitti_' + name + '_infos.pkl' # 得到對應(yīng)ann_file
    project_mat = img_metas.get('lidar2img') # 獲取投影矩陣
    result, data = inference_multi_modality_detector(model, pts_file, img_file_path, ann_file)  # 推理
    bboxes_data = result[0]['pts_bbox']['boxes_3d'] # 提取結(jié)果中的3D bbox
    # 保存可視化圖像
    show_multi_modality_result(img=image,
                               box_mode='lidar',
                               gt_bboxes=None,
                               img_metas=img_metas,
                               pred_bboxes=bboxes_data,
                               proj_mat=project_mat,
                               out_dir="/home/wistful/work/mmdetection3d/visual_img/kitti/",
                               filename=name,
                               show=False)