本文介紹在3D目標(biāo)檢測中,理解和使用KITTI 數(shù)據(jù)集,包括KITTI 的基本情況、下載數(shù)據(jù)集、標(biāo)簽格式解析、3D框可視化、點(diǎn)云轉(zhuǎn)圖像、畫BEV鳥瞰圖等,并配有實(shí)現(xiàn)代碼。
目錄
?1、KITTI數(shù)據(jù)集3D框可視化
2、KITTI 3D數(shù)據(jù)集
3、下載數(shù)據(jù)集
4、標(biāo)簽格式
5、標(biāo)定參數(shù)解析
6、點(diǎn)云數(shù)據(jù)-->投影到圖像
7、圖像數(shù)據(jù)-->投影到點(diǎn)云
8、可視化圖像2D結(jié)果、3D結(jié)果
9、點(diǎn)云3D結(jié)果-->圖像BEV鳥瞰圖結(jié)果(坐標(biāo)系轉(zhuǎn)換)
10、繪制BEV鳥瞰圖
11、BEV鳥瞰圖畫2d框
12、完整工程代碼
?1、KITTI數(shù)據(jù)集3D框可視化
2、KITTI 3D數(shù)據(jù)集
kitti 3D數(shù)據(jù)集的基本情況:
KITTI整個數(shù)據(jù)集是在德國卡爾斯魯厄采集的,采集時長6小時。KITTI官網(wǎng)放出的數(shù)據(jù)大約占采集全部的25%,去除了測試集中相關(guān)的數(shù)據(jù)片段,按場景可以分為“道路”、“城市”、“住宅區(qū)”、“校園”和“行人”5類。
傳感器配置:
傳感器安裝位置:
3、下載數(shù)據(jù)集
The KITTI Vision Benchmark Suite (cvlibs.net)
下載數(shù)據(jù)需要注冊賬號的,獲取取百度網(wǎng)盤下載;文件的格式如下所示
圖片格式:xxx.jpg
點(diǎn)云格式:xxx.bin(點(diǎn)云是以bin二進(jìn)制的方式存儲的)
標(biāo)定參數(shù):xxx.txt(一個文件中包括各個相機(jī)的內(nèi)參、畸變校正矩陣、激光雷達(dá)坐標(biāo)轉(zhuǎn)到相機(jī)坐標(biāo)的矩陣、IMU坐標(biāo)轉(zhuǎn)到激光雷達(dá)坐標(biāo)的矩陣)
標(biāo)簽格式:xxx.txt(包含類別、截?cái)嗲闆r、遮擋情況、觀測角度、2D框左上角坐標(biāo)、2D框右下角坐標(biāo)、3D物體的尺寸-高寬長、3D物體的中心坐標(biāo)-xyz、置信度)
4、標(biāo)簽格式
示例標(biāo)簽:Pedestrian 0.00 0 -0.20 712.40 143.00 810.73 307.92 1.89 0.48 1.20 1.84 1.47 8.41 0.01?
這時可以看看這個視頻:
Nuscenes、KITTI等多個BEV開源數(shù)據(jù)集介紹
5、標(biāo)定參數(shù)解析
然后看一下標(biāo)定參數(shù):
P0-P3:是各個相機(jī)的內(nèi)參矩陣;3×4的相機(jī)投影矩陣,0~3分別對應(yīng)左側(cè)灰度相機(jī)、右側(cè)灰度相機(jī)、左側(cè)彩色相機(jī)、右側(cè)彩色相機(jī)。
R0_rect: 是左相機(jī)的畸變矯正矩陣;3×3的旋轉(zhuǎn)修正矩陣。
Tr_velo_to_cam:是激光雷達(dá)坐標(biāo)系 轉(zhuǎn)到 相機(jī)坐標(biāo)系矩陣;3×4的激光坐標(biāo)系到Cam 0坐標(biāo)系的變換矩陣。
Tr_imu_to_velo: 是IMU坐標(biāo)轉(zhuǎn)到激光雷達(dá)坐標(biāo)的矩陣;3×4的IMU坐標(biāo)系到激光坐標(biāo)系的變換矩陣。
6、點(diǎn)云數(shù)據(jù)-->投影到圖像
當(dāng)有了點(diǎn)云數(shù)據(jù)信息,如何投影到圖像中呢?本質(zhì)上是一個坐標(biāo)系轉(zhuǎn)換的問題,流程思路如下:
- 已知點(diǎn)云坐標(biāo)(x,y,z),當(dāng)前是處于激光雷達(dá)坐標(biāo)系
- 激光雷達(dá)坐標(biāo)系 轉(zhuǎn)到 相機(jī)坐標(biāo)系,需要用到標(biāo)定參數(shù)中的Tr_velo_to_cam矩陣,此時得到相機(jī)坐標(biāo)(x1,y1,z1)
- 相機(jī)坐標(biāo)系進(jìn)行畸變矯正,需要用到標(biāo)定參數(shù)中的R0_rect矩陣,此時得到相機(jī)坐標(biāo)(x2,y2,z2)
- 相機(jī)坐標(biāo)系轉(zhuǎn)為圖像坐標(biāo)系,需要用到標(biāo)定參數(shù)中的P0矩陣,即相機(jī)內(nèi)存矩陣,此時得到圖像坐標(biāo)(u,v)
看一下示例效果:
接口代碼:
'''
將點(diǎn)云數(shù)據(jù)投影到圖像
'''
def show_lidar_on_image(pc_velo, img, calib, img_width, img_height):
''' Project LiDAR points to image '''
imgfov_pc_velo, pts_2d, fov_inds = get_lidar_in_image_fov(pc_velo,
calib, 0, 0, img_width, img_height, True)
imgfov_pts_2d = pts_2d[fov_inds,:]
imgfov_pc_rect = calib.project_velo_to_rect(imgfov_pc_velo)
import matplotlib.pyplot as plt
cmap = plt.cm.get_cmap('hsv', 256)
cmap = np.array([cmap(i) for i in range(256)])[:,:3]*255
for i in range(imgfov_pts_2d.shape[0]):
depth = imgfov_pc_rect[i,2]
color = cmap[int(640.0/depth),:]
cv2.circle(img, (int(np.round(imgfov_pts_2d[i,0])),
int(np.round(imgfov_pts_2d[i,1]))),
2, color=tuple(color), thickness=-1)
Image.fromarray(img).save('save_output/lidar_on_image.png')
Image.fromarray(img).show()
return img
核心代碼:
'''
將點(diǎn)云數(shù)據(jù)投影到相機(jī)坐標(biāo)系
'''
def get_lidar_in_image_fov(pc_velo, calib, xmin, ymin, xmax, ymax,
return_more=False, clip_distance=2.0):
''' Filter lidar points, keep those in image FOV '''
pts_2d = calib.project_velo_to_image(pc_velo)
fov_inds = (pts_2d[:,0]<xmax) & (pts_2d[:,0]>=xmin) & \
(pts_2d[:,1]<ymax) & (pts_2d[:,1]>=ymin)
fov_inds = fov_inds & (pc_velo[:,0]>clip_distance)
imgfov_pc_velo = pc_velo[fov_inds,:]
if return_more:
return imgfov_pc_velo, pts_2d, fov_inds
else:
return imgfov_pc_velo
7、圖像數(shù)據(jù)-->投影到點(diǎn)云
當(dāng)有了圖像RGB信息,如何投影到點(diǎn)云中呢?本質(zhì)上是一個坐標(biāo)系轉(zhuǎn)換的問題,和上面的是逆過程,流程思路如下:
- 已知圖像坐標(biāo)(u,v),當(dāng)前是處于圖像坐標(biāo)系
- 圖像坐標(biāo)系 轉(zhuǎn) 相機(jī)坐標(biāo)系,需要用到標(biāo)定參數(shù)中的P0逆矩陣,即相機(jī)內(nèi)存矩陣,得到相機(jī)坐標(biāo)(x,y,z)
- 相機(jī)坐標(biāo)系進(jìn)行畸變矯正,需要用到標(biāo)定參數(shù)中的R0_rect逆矩陣,得到相機(jī)坐標(biāo)(x1,y1,z1)
- 矯正后相機(jī)坐標(biāo)系 轉(zhuǎn)?激光雷達(dá)坐標(biāo)系,需要用到標(biāo)定參數(shù)中的Tr_velo_to_cam逆矩陣,此時得到激光雷達(dá)坐標(biāo)(x2,y2,z2)
8、可視化圖像2D結(jié)果、3D結(jié)果
先看一下2D框的效果:
3D框的效果:
?接口代碼:
'''
在圖像中畫2D框、3D框
'''
def show_image_with_boxes(img, objects, calib, show3d=True):
img1 = np.copy(img) # for 2d bbox
img2 = np.copy(img) # for 3d bbox
for obj in objects:
if obj.type=='DontCare':continue
cv2.rectangle(img1, (int(obj.xmin),int(obj.ymin)), (int(obj.xmax),int(obj.ymax)), (0,255,0), 2) # 畫2D框
box3d_pts_2d, box3d_pts_3d = utils.compute_box_3d(obj, calib.P) # 獲取圖像3D框(8*2)、相機(jī)坐標(biāo)系3D框(8*3)
img2 = utils.draw_projected_box3d(img2, box3d_pts_2d) # 在圖像上畫3D框
if show3d:
Image.fromarray(img2).save('save_output/image_with_3Dboxes.png')
Image.fromarray(img2).show()
else:
Image.fromarray(img1).save('save_output/image_with_2Dboxes.png')
Image.fromarray(img1).show()
核心代碼:
def compute_box_3d(obj, P):
'''
計(jì)算對象的3D邊界框在圖像平面上的投影
輸入: obj代表一個物體標(biāo)簽信息, P代表相機(jī)的投影矩陣-內(nèi)參。
輸出: 返回兩個值, corners_3d表示3D邊界框在 相機(jī)坐標(biāo)系 的8個角點(diǎn)的坐標(biāo)-3D坐標(biāo)。
corners_2d表示3D邊界框在 圖像上 的8個角點(diǎn)的坐標(biāo)-2D坐標(biāo)。
'''
# 計(jì)算一個繞Y軸旋轉(zhuǎn)的旋轉(zhuǎn)矩陣R,用于將3D坐標(biāo)從世界坐標(biāo)系轉(zhuǎn)換到相機(jī)坐標(biāo)系。obj.ry是對象的偏航角
R = roty(obj.ry)
# 物體實(shí)際的長、寬、高
l = obj.l;
w = obj.w;
h = obj.h;
# 存儲了3D邊界框的8個角點(diǎn)相對于對象中心的坐標(biāo)。這些坐標(biāo)定義了3D邊界框的形狀。
x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2];
y_corners = [0,0,0,0,-h,-h,-h,-h];
z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2];
# 1、將3D邊界框的角點(diǎn)坐標(biāo)從對象坐標(biāo)系轉(zhuǎn)換到相機(jī)坐標(biāo)系。它使用了旋轉(zhuǎn)矩陣R
corners_3d = np.dot(R, np.vstack([x_corners,y_corners,z_corners]))
# 3D邊界框的坐標(biāo)進(jìn)行平移
corners_3d[0,:] = corners_3d[0,:] + obj.t[0];
corners_3d[1,:] = corners_3d[1,:] + obj.t[1];
corners_3d[2,:] = corners_3d[2,:] + obj.t[2];
# 2、檢查對象是否在相機(jī)前方,因?yàn)橹挥性谙鄼C(jī)前方的對象才會被繪制。
# 如果對象的Z坐標(biāo)(深度)小于0.1,就意味著對象在相機(jī)后方,那么corners_2d將被設(shè)置為None,函數(shù)將返回None。
if np.any(corners_3d[2,:]<0.1):
corners_2d = None
return corners_2d, np.transpose(corners_3d)
# 3、將相機(jī)坐標(biāo)系下的3D邊界框的角點(diǎn),投影到圖像平面上,得到它們在圖像上的2D坐標(biāo)。
corners_2d = project_to_image(np.transpose(corners_3d), P);
return corners_2d, np.transpose(corners_3d)
def draw_projected_box3d(image, qs, color=(0,60,255), thickness=2):
'''
qs: 包含8個3D邊界框角點(diǎn)坐標(biāo)的數(shù)組, 形狀為(8, 2)。圖像坐標(biāo)下的3D框, 8個頂點(diǎn)坐標(biāo)。
'''
''' Draw 3d bounding box in image
qs: (8,2) array of vertices for the 3d box in following order:
1 -------- 0
/| /|
2 -------- 3 .
| | | |
. 5 -------- 4
|/ |/
6 -------- 7
'''
qs = qs.astype(np.int32) # 將輸入的頂點(diǎn)坐標(biāo)轉(zhuǎn)換為整數(shù)類型,以便在圖像上繪制。
# 這個循環(huán)迭代4次,每次處理一個邊界框的一條邊。
for k in range(0,4):
# Ref: http://docs.enthought.com/mayavi/mayavi/auto/mlab_helper_functions.html
# 定義了要繪制的邊的起始點(diǎn)和結(jié)束點(diǎn)的索引。在這個循環(huán)中,它用于繪制邊界框的前四條邊。
i,j=k,(k+1)%4
cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)
# 定義了要繪制的邊的起始點(diǎn)和結(jié)束點(diǎn)的索引。在這個循環(huán)中,它用于繪制邊界框的后四條邊,與前四條邊平行
i,j=k+4,(k+1)%4 + 4
cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)
# 定義了要繪制的邊的起始點(diǎn)和結(jié)束點(diǎn)的索引。在這個循環(huán)中,它用于繪制連接前四條邊和后四條邊的邊界框的邊。
i,j=k,k+4
cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)
return image
9、點(diǎn)云3D結(jié)果-->圖像BEV鳥瞰圖結(jié)果(坐標(biāo)系轉(zhuǎn)換)
思路流程:
- 讀取點(diǎn)云數(shù)據(jù),點(diǎn)云得存儲格式是n*4,n是指當(dāng)前文件點(diǎn)云的數(shù)量,4分別表示(x,y,z,intensity),即點(diǎn)云的空間三維坐標(biāo)、反射強(qiáng)度
- 我們只需讀取前兩行即可,得到坐標(biāo)點(diǎn)(x,y)
- 然后將坐標(biāo)點(diǎn)(x,y),畫散點(diǎn)圖
BEV鳥瞰圖效果如下:
10、繪制BEV鳥瞰圖
BEV圖像示例效果:
核心代碼:
'''
可視化BEV鳥瞰圖
'''
def show_lidar_topview(pc_velo, objects, calib):
# 1-設(shè)置鳥瞰圖范圍
side_range = (-30, 30) # 左右距離
fwd_range = (0, 80) # 后前距離
x_points = pc_velo[:, 0]
y_points = pc_velo[:, 1]
z_points = pc_velo[:, 2]
# 2-獲得區(qū)域內(nèi)的點(diǎn)
f_filt = np.logical_and(x_points > fwd_range[0], x_points < fwd_range[1])
s_filt = np.logical_and(y_points > side_range[0], y_points < side_range[1])
filter = np.logical_and(f_filt, s_filt)
indices = np.argwhere(filter).flatten()
x_points = x_points[indices]
y_points = y_points[indices]
z_points = z_points[indices]
# 定義了鳥瞰圖中每個像素代表的距離
res = 0.1
# 3-1將點(diǎn)云坐標(biāo)系 轉(zhuǎn)到 BEV坐標(biāo)系
x_img = (-y_points / res).astype(np.int32)
y_img = (-x_points / res).astype(np.int32)
# 3-2調(diào)整坐標(biāo)原點(diǎn)
x_img -= int(np.floor(side_range[0]) / res)
y_img += int(np.floor(fwd_range[1]) / res)
print(x_img.min(), x_img.max(), y_img.min(), y_img.max())
# 4-填充像素值, 將點(diǎn)云數(shù)據(jù)的高度信息(Z坐標(biāo))映射到像素值
height_range = (-3, 1.0)
pixel_value = np.clip(a=z_points, a_max=height_range[1], a_min=height_range[0])
def scale_to_255(a, min, max, dtype=np.uint8):
return ((a - min) / float(max - min) * 255).astype(dtype)
pixel_value = scale_to_255(pixel_value, height_range[0], height_range[1])
# 創(chuàng)建圖像數(shù)組
x_max = 1 + int((side_range[1] - side_range[0]) / res)
y_max = 1 + int((fwd_range[1] - fwd_range[0]) / res)
im = np.zeros([y_max, x_max], dtype=np.uint8)
im[y_img, x_img] = pixel_value
im2 = Image.fromarray(im)
im2.save('save_output/BEV.png')
im2.show()
11、BEV鳥瞰圖畫2d框
在BEV視圖中畫框,可視化結(jié)果:
接口代碼:
'''
將點(diǎn)云數(shù)據(jù)3D框投影到BEV
'''
def show_lidar_topview_with_boxes(img, objects, calib):
def bbox3d(obj):
box3d_pts_2d, box3d_pts_3d = utils.compute_box_3d(obj, calib.P) # 獲取3D框-圖像、3D框-相機(jī)坐標(biāo)系
box3d_pts_3d_velo = calib.project_rect_to_velo(box3d_pts_3d) # 將相機(jī)坐標(biāo)系的框 轉(zhuǎn)到 激光雷達(dá)坐標(biāo)系
return box3d_pts_3d_velo # 返回nx3的點(diǎn)
boxes3d = [bbox3d(obj) for obj in objects if obj.type == "Car"]
gt = np.array(boxes3d)
im2 = utils.draw_box3d_label_on_bev(img, gt, scores=None, thickness=1) # 獲取激光雷達(dá)坐標(biāo)系的3D點(diǎn),選擇x, y兩維,畫到BEV平面坐標(biāo)系上
im2 = Image.fromarray(im2)
im2.save('save_output/BEV with boxes.png')
im2.show()
核心代碼:
# 設(shè)置BEV鳥瞰圖參數(shù)
side_range = (-30, 30) # 左右距離
fwd_range = (0, 80) # 后前距離
res = 0.1 # 分辨率0.05m
def compute_box_3d(obj, P):
'''
計(jì)算對象的3D邊界框在圖像平面上的投影
輸入: obj代表一個物體標(biāo)簽信息, P代表相機(jī)的投影矩陣-內(nèi)參。
輸出: 返回兩個值, corners_3d表示3D邊界框在 相機(jī)坐標(biāo)系 的8個角點(diǎn)的坐標(biāo)-3D坐標(biāo)。
corners_2d表示3D邊界框在 圖像上 的8個角點(diǎn)的坐標(biāo)-2D坐標(biāo)。
'''
# 計(jì)算一個繞Y軸旋轉(zhuǎn)的旋轉(zhuǎn)矩陣R,用于將3D坐標(biāo)從世界坐標(biāo)系轉(zhuǎn)換到相機(jī)坐標(biāo)系。obj.ry是對象的偏航角
R = roty(obj.ry)
# 物體實(shí)際的長、寬、高
l = obj.l;
w = obj.w;
h = obj.h;
# 存儲了3D邊界框的8個角點(diǎn)相對于對象中心的坐標(biāo)。這些坐標(biāo)定義了3D邊界框的形狀。
x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2];
y_corners = [0,0,0,0,-h,-h,-h,-h];
z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2];
# 1、將3D邊界框的角點(diǎn)坐標(biāo)從對象坐標(biāo)系轉(zhuǎn)換到相機(jī)坐標(biāo)系。它使用了旋轉(zhuǎn)矩陣R
corners_3d = np.dot(R, np.vstack([x_corners,y_corners,z_corners]))
# 3D邊界框的坐標(biāo)進(jìn)行平移
corners_3d[0,:] = corners_3d[0,:] + obj.t[0];
corners_3d[1,:] = corners_3d[1,:] + obj.t[1];
corners_3d[2,:] = corners_3d[2,:] + obj.t[2];
# 2、檢查對象是否在相機(jī)前方,因?yàn)橹挥性谙鄼C(jī)前方的對象才會被繪制。
# 如果對象的Z坐標(biāo)(深度)小于0.1,就意味著對象在相機(jī)后方,那么corners_2d將被設(shè)置為None,函數(shù)將返回None。
if np.any(corners_3d[2,:]<0.1):
corners_2d = None
return corners_2d, np.transpose(corners_3d)
# 3、將相機(jī)坐標(biāo)系下的3D邊界框的角點(diǎn),投影到圖像平面上,得到它們在圖像上的2D坐標(biāo)。
corners_2d = project_to_image(np.transpose(corners_3d), P);
return corners_2d, np.transpose(corners_3d)
12、完整工程代碼
工程目錄:
kitti_vis_main.py(主代碼入口)
from __future__ import print_function
import os
import sys
import cv2
import os.path
from PIL import Image
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(BASE_DIR)
sys.path.append(os.path.join(ROOT_DIR, 'mayavi'))
from kitti_object import *
def visualization():
import mayavi.mlab as mlab
dataset = kitti_object(os.path.join(ROOT_DIR, 'Kitti_3D_Vis/dataset/object')) # linux 路徑
data_idx = 10 # 選擇第幾張圖像
# 1-加載標(biāo)簽數(shù)據(jù)
objects = dataset.get_label_objects(data_idx)
print("There are %d objects.", len(objects))
# 2-加載圖像
img = dataset.get_image(data_idx)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img_height, img_width, img_channel = img.shape
# 3-加載點(diǎn)云數(shù)據(jù)
pc_velo = dataset.get_lidar(data_idx)[:,0:3] # (x, y, z)
# 4-加載標(biāo)定參數(shù)
calib = dataset.get_calibration(data_idx)
# 5-可視化原始圖像
print(' ------------ show raw image -------- ')
Image.fromarray(img).show()
# 6-在圖像中畫2D框
print(' ------------ show image with 2D bounding box -------- ')
show_image_with_boxes(img, objects, calib, False)
# 7-在圖像中畫3D框
print(' ------------ show image with 3D bounding box ------- ')
show_image_with_boxes(img, objects, calib, True)
# 8-將點(diǎn)云數(shù)據(jù)投影到圖像
print(' ----------- LiDAR points projected to image plane -- ')
show_lidar_on_image(pc_velo, img, calib, img_width, img_height)
# 9-畫BEV圖
print('------------------ BEV of LiDAR points -----------------------------')
show_lidar_topview(pc_velo, objects, calib)
# 10-在BEV圖中畫2D框
print('--------------- BEV of LiDAR points with bobes ---------------------')
img1 = cv2.imread('save_output/BEV.png')
img = cv2.cvtColor(img1, cv2.COLOR_BGR2RGB)
show_lidar_topview_with_boxes(img1, objects, calib)
if __name__=='__main__':
visualization()
kitti_util.py
from __future__ import print_function
import numpy as np
import cv2
from PIL import Image
import os
# 設(shè)置BEV鳥瞰圖參數(shù)
side_range = (-30, 30) # 左右距離
fwd_range = (0, 80) # 后前距離
res = 0.1 # 分辨率0.05m
def compute_box_3d(obj, P):
'''
計(jì)算對象的3D邊界框在圖像平面上的投影
輸入: obj代表一個物體標(biāo)簽信息, P代表相機(jī)的投影矩陣-內(nèi)參。
輸出: 返回兩個值, corners_3d表示3D邊界框在 相機(jī)坐標(biāo)系 的8個角點(diǎn)的坐標(biāo)-3D坐標(biāo)。
corners_2d表示3D邊界框在 圖像上 的8個角點(diǎn)的坐標(biāo)-2D坐標(biāo)。
'''
# 計(jì)算一個繞Y軸旋轉(zhuǎn)的旋轉(zhuǎn)矩陣R,用于將3D坐標(biāo)從世界坐標(biāo)系轉(zhuǎn)換到相機(jī)坐標(biāo)系。obj.ry是對象的偏航角
R = roty(obj.ry)
# 物體實(shí)際的長、寬、高
l = obj.l;
w = obj.w;
h = obj.h;
# 存儲了3D邊界框的8個角點(diǎn)相對于對象中心的坐標(biāo)。這些坐標(biāo)定義了3D邊界框的形狀。
x_corners = [l/2,l/2,-l/2,-l/2,l/2,l/2,-l/2,-l/2];
y_corners = [0,0,0,0,-h,-h,-h,-h];
z_corners = [w/2,-w/2,-w/2,w/2,w/2,-w/2,-w/2,w/2];
# 1、將3D邊界框的角點(diǎn)坐標(biāo)從對象坐標(biāo)系轉(zhuǎn)換到相機(jī)坐標(biāo)系。它使用了旋轉(zhuǎn)矩陣R
corners_3d = np.dot(R, np.vstack([x_corners,y_corners,z_corners]))
# 3D邊界框的坐標(biāo)進(jìn)行平移
corners_3d[0,:] = corners_3d[0,:] + obj.t[0];
corners_3d[1,:] = corners_3d[1,:] + obj.t[1];
corners_3d[2,:] = corners_3d[2,:] + obj.t[2];
# 2、檢查對象是否在相機(jī)前方,因?yàn)橹挥性谙鄼C(jī)前方的對象才會被繪制。
# 如果對象的Z坐標(biāo)(深度)小于0.1,就意味著對象在相機(jī)后方,那么corners_2d將被設(shè)置為None,函數(shù)將返回None。
if np.any(corners_3d[2,:]<0.1):
corners_2d = None
return corners_2d, np.transpose(corners_3d)
# 3、將相機(jī)坐標(biāo)系下的3D邊界框的角點(diǎn),投影到圖像平面上,得到它們在圖像上的2D坐標(biāo)。
corners_2d = project_to_image(np.transpose(corners_3d), P);
return corners_2d, np.transpose(corners_3d)
def project_to_image(pts_3d, P):
'''
將相機(jī)坐標(biāo)系下的3D邊界框的角點(diǎn), 投影到圖像平面上, 得到它們在圖像上的2D坐標(biāo)
輸入: pts_3d是一個nx3的矩陣, 包含了待投影的3D坐標(biāo)點(diǎn)(每行一個點(diǎn)), P是相機(jī)的投影矩陣, 通常是一個3x4的矩陣。
輸出: 返回一個nx2的矩陣, 包含了投影到圖像平面上的2D坐標(biāo)點(diǎn)。
P(3x4) dot pts_3d_extended(4xn) = projected_pts_2d(3xn) => normalize projected_pts_2d(2xn)
<=> pts_3d_extended(nx4) dot P'(4x3) = projected_pts_2d(nx3) => normalize projected_pts_2d(nx2)
'''
n = pts_3d.shape[0] # 獲取3D點(diǎn)的數(shù)量
pts_3d_extend = np.hstack((pts_3d, np.ones((n,1)))) # 將每個3D點(diǎn)的坐標(biāo)擴(kuò)展為齊次坐標(biāo)形式(4D),通過在每個點(diǎn)的末尾添加1,創(chuàng)建了一個nx4的矩陣。
pts_2d = np.dot(pts_3d_extend, np.transpose(P)) # 將擴(kuò)展的3D坐標(biāo)點(diǎn)矩陣與投影矩陣P相乘,得到一個nx3的矩陣,其中每一行包含了3D點(diǎn)在圖像平面上的投影坐標(biāo)。每個點(diǎn)的坐標(biāo)表示為[x, y, z]。
pts_2d[:,0] /= pts_2d[:,2] # 將投影坐標(biāo)中的x坐標(biāo)除以z坐標(biāo),從而獲得2D圖像上的x坐標(biāo)。
pts_2d[:,1] /= pts_2d[:,2] # 將投影坐標(biāo)中的y坐標(biāo)除以z坐標(biāo),從而獲得2D圖像上的y坐標(biāo)。
return pts_2d[:,0:2] # 返回一個nx2的矩陣,其中包含了每個3D點(diǎn)在2D圖像上的坐標(biāo)。
def draw_projected_box3d(image, qs, color=(0,60,255), thickness=2):
'''
qs: 包含8個3D邊界框角點(diǎn)坐標(biāo)的數(shù)組, 形狀為(8, 2)。圖像坐標(biāo)下的3D框, 8個頂點(diǎn)坐標(biāo)。
'''
''' Draw 3d bounding box in image
qs: (8,2) array of vertices for the 3d box in following order:
1 -------- 0
/| /|
2 -------- 3 .
| | | |
. 5 -------- 4
|/ |/
6 -------- 7
'''
qs = qs.astype(np.int32) # 將輸入的頂點(diǎn)坐標(biāo)轉(zhuǎn)換為整數(shù)類型,以便在圖像上繪制。
# 這個循環(huán)迭代4次,每次處理一個邊界框的一條邊。
for k in range(0,4):
# Ref: http://docs.enthought.com/mayavi/mayavi/auto/mlab_helper_functions.html
# 定義了要繪制的邊的起始點(diǎn)和結(jié)束點(diǎn)的索引。在這個循環(huán)中,它用于繪制邊界框的前四條邊。
i,j=k,(k+1)%4
cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)
# 定義了要繪制的邊的起始點(diǎn)和結(jié)束點(diǎn)的索引。在這個循環(huán)中,它用于繪制邊界框的后四條邊,與前四條邊平行
i,j=k+4,(k+1)%4 + 4
cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)
# 定義了要繪制的邊的起始點(diǎn)和結(jié)束點(diǎn)的索引。在這個循環(huán)中,它用于繪制連接前四條邊和后四條邊的邊界框的邊。
i,j=k,k+4
cv2.line(image, (qs[i,0],qs[i,1]), (qs[j,0],qs[j,1]), color, thickness)
return image
def draw_box3d_label_on_bev(image, boxes3d, thickness=1, scores=None):
# if scores is not None and scores.shape[0] >0:
img = image.copy()
num = len(boxes3d)
for n in range(num):
b = boxes3d[n]
x0 = b[0, 0]
y0 = b[0, 1]
x1 = b[1, 0]
y1 = b[1, 1]
x2 = b[2, 0]
y2 = b[2, 1]
x3 = b[3, 0]
y3 = b[3, 1]
if (x0<30 and x1<30 and x2<30 and x3<30):
u0, v0 = lidar_to_top_coords(x0, y0)
u1, v1 = lidar_to_top_coords(x1, y1)
u2, v2 = lidar_to_top_coords(x2, y2)
u3, v3 = lidar_to_top_coords(x3, y3)
color = (0, 255, 0) # green
cv2.line(img, (u0, v0), (u1, v1), color, thickness, cv2.LINE_AA)
cv2.line(img, (u1, v1), (u2, v2), color, thickness, cv2.LINE_AA)
cv2.line(img, (u2, v2), (u3, v3), color, thickness, cv2.LINE_AA)
cv2.line(img, (u3, v3), (u0, v0), color, thickness, cv2.LINE_AA)
elif (x0<50 and x1<50 and x2<50 and x3<50):
color = (255, 0, 0) # red
u0, v0 = lidar_to_top_coords(x0, y0)
u1, v1 = lidar_to_top_coords(x1, y1)
u2, v2 = lidar_to_top_coords(x2, y2)
u3, v3 = lidar_to_top_coords(x3, y3)
cv2.line(img, (u0, v0), (u1, v1), color, thickness, cv2.LINE_AA)
cv2.line(img, (u1, v1), (u2, v2), color, thickness, cv2.LINE_AA)
cv2.line(img, (u2, v2), (u3, v3), color, thickness, cv2.LINE_AA)
cv2.line(img, (u3, v3), (u0, v0), color, thickness, cv2.LINE_AA)
else:
color = (0, 0, 255) # blue
u0, v0 = lidar_to_top_coords(x0, y0)
u1, v1 = lidar_to_top_coords(x1, y1)
u2, v2 = lidar_to_top_coords(x2, y2)
u3, v3 = lidar_to_top_coords(x3, y3)
cv2.line(img, (u0, v0), (u1, v1), color, thickness, cv2.LINE_AA)
cv2.line(img, (u1, v1), (u2, v2), color, thickness, cv2.LINE_AA)
cv2.line(img, (u2, v2), (u3, v3), color, thickness, cv2.LINE_AA)
cv2.line(img, (u3, v3), (u0, v0), color, thickness, cv2.LINE_AA)
return img
def draw_box3d_predict_on_bev(image, boxes3d, thickness=1, scores=None):
# if scores is not None and scores.shape[0] >0:
img = image.copy()
num = len(boxes3d)
for n in range(num):
b = boxes3d[n]
x0 = b[0, 0]
y0 = b[0, 1]
x1 = b[1, 0]
y1 = b[1, 1]
x2 = b[2, 0]
y2 = b[2, 1]
x3 = b[3, 0]
y3 = b[3, 1]
color = (255, 255, 255) # white
u0, v0 = lidar_to_top_coords(x0, y0)
u1, v1 = lidar_to_top_coords(x1, y1)
u2, v2 = lidar_to_top_coords(x2, y2)
u3, v3 = lidar_to_top_coords(x3, y3)
cv2.line(img, (u0, v0), (u1, v1), color, thickness, cv2.LINE_AA)
cv2.line(img, (u1, v1), (u2, v2), color, thickness, cv2.LINE_AA)
cv2.line(img, (u2, v2), (u3, v3), color, thickness, cv2.LINE_AA)
cv2.line(img, (u3, v3), (u0, v0), color, thickness, cv2.LINE_AA)
return img
def lidar_to_top_coords(x, y, z=None):
if 0:
return x, y
else:
# print("TOP_X_MAX-TOP_X_MIN:",TOP_X_MAX,TOP_X_MIN)
xx = (-y / res).astype(np.int32)
yy = (-x / res).astype(np.int32)
# 調(diào)整坐標(biāo)原點(diǎn)
xx -= int(np.floor(side_range[0]) / res)
yy += int(np.floor(fwd_range[1]) / res)
return xx, yy
# 解析標(biāo)簽數(shù)據(jù)
class Object3d(object):
''' 3d object label '''
def __init__(self, label_file_line):
data = label_file_line.split(' ')
data[1:] = [float(x) for x in data[1:]]
# extract label, truncation, occlusion
self.type = data[0] # 'Car', 'Pedestrian', ...
self.truncation = data[1] # truncated pixel ratio [0..1]
self.occlusion = int(data[2]) # 0=visible, 1=partly occluded, 2=fully occluded, 3=unknown
self.alpha = data[3] # object observation angle [-pi..pi]
# extract 2d bounding box in 0-based coordinates
self.xmin = data[4] # left
self.ymin = data[5] # top
self.xmax = data[6] # right
self.ymax = data[7] # bottom
self.box2d = np.array([self.xmin,self.ymin,self.xmax,self.ymax])
# extract 3d bounding box information
self.h = data[8] # box height
self.w = data[9] # box width
self.l = data[10] # box length (in meters)
self.t = (data[11],data[12],data[13]) # location (x,y,z) in camera coord.
self.ry = data[14] # yaw angle (around Y-axis in camera coordinates) [-pi..pi]
def print_object(self):
print('Type, truncation, occlusion, alpha: %s, %d, %d, %f' % \
(self.type, self.truncation, self.occlusion, self.alpha))
print('2d bbox (x0,y0,x1,y1): %f, %f, %f, %f' % \
(self.xmin, self.ymin, self.xmax, self.ymax))
print('3d bbox h,w,l: %f, %f, %f' % \
(self.h, self.w, self.l))
print('3d bbox location, ry: (%f, %f, %f), %f' % \
(self.t[0],self.t[1],self.t[2],self.ry))
class Calibration(object):
''' Calibration matrices and utils
3d XYZ in <label>.txt are in rect camera coord.
2d box xy are in image2 coord
Points in <lidar>.bin are in Velodyne coord.
y_image2 = P^2_rect * x_rect
y_image2 = P^2_rect * R0_rect * Tr_velo_to_cam * x_velo
x_ref = Tr_velo_to_cam * x_velo
x_rect = R0_rect * x_ref
P^2_rect = [f^2_u, 0, c^2_u, -f^2_u b^2_x;
0, f^2_v, c^2_v, -f^2_v b^2_y;
0, 0, 1, 0]
= K * [1|t]
image2 coord:
----> x-axis (u)
|
|
v y-axis (v)
velodyne coord:
front x, left y, up z
rect/ref camera coord:
right x, down y, front z
Ref (KITTI paper): http://www.cvlibs.net/publications/Geiger2013IJRR.pdf
TODO(rqi): do matrix multiplication only once for each projection.
'''
def __init__(self, calib_filepath, from_video=False):
if from_video:
calibs = self.read_calib_from_video(calib_filepath)
else:
calibs = self.read_calib_file(calib_filepath)
# Projection matrix from rect camera coord to image2 coord
self.P = calibs['P2']
self.P = np.reshape(self.P, [3,4])
# Rigid transform from Velodyne coord to reference camera coord
self.V2C = calibs['Tr_velo_to_cam']
self.V2C = np.reshape(self.V2C, [3,4])
self.C2V = inverse_rigid_trans(self.V2C)
# Rotation from reference camera coord to rect camera coord
self.R0 = calibs['R0_rect']
self.R0 = np.reshape(self.R0,[3,3])
# Camera intrinsics and extrinsics
self.c_u = self.P[0,2]
self.c_v = self.P[1,2]
self.f_u = self.P[0,0]
self.f_v = self.P[1,1]
self.b_x = self.P[0,3]/(-self.f_u) # relative
self.b_y = self.P[1,3]/(-self.f_v)
def read_calib_file(self, filepath):
''' Read in a calibration file and parse into a dictionary.'''
data = {}
with open(filepath, 'r') as f:
for line in f.readlines():
line = line.rstrip()
if len(line)==0: continue
key, value = line.split(':', 1)
# The only non-float values in these files are dates, which
# we don't care about anyway
try:
data[key] = np.array([float(x) for x in value.split()])
except ValueError:
pass
return data
def read_calib_from_video(self, calib_root_dir):
''' Read calibration for camera 2 from video calib files.
there are calib_cam_to_cam and calib_velo_to_cam under the calib_root_dir
'''
data = {}
cam2cam = self.read_calib_file(os.path.join(calib_root_dir, 'calib_cam_to_cam.txt'))
velo2cam = self.read_calib_file(os.path.join(calib_root_dir, 'calib_velo_to_cam.txt'))
Tr_velo_to_cam = np.zeros((3,4))
Tr_velo_to_cam[0:3,0:3] = np.reshape(velo2cam['R'], [3,3])
Tr_velo_to_cam[:,3] = velo2cam['T']
data['Tr_velo_to_cam'] = np.reshape(Tr_velo_to_cam, [12])
data['R0_rect'] = cam2cam['R_rect_00']
data['P2'] = cam2cam['P_rect_02']
return data
def cart2hom(self, pts_3d):
''' Input: nx3 points in Cartesian
Oupput: nx4 points in Homogeneous by pending 1
'''
n = pts_3d.shape[0]
pts_3d_hom = np.hstack((pts_3d, np.ones((n,1))))
return pts_3d_hom
# ===========================
# ------- 3d to 3d ----------
# ===========================
def project_velo_to_ref(self, pts_3d_velo):
pts_3d_velo = self.cart2hom(pts_3d_velo) # nx4
return np.dot(pts_3d_velo, np.transpose(self.V2C))
def project_ref_to_velo(self, pts_3d_ref):
pts_3d_ref = self.cart2hom(pts_3d_ref) # nx4
return np.dot(pts_3d_ref, np.transpose(self.C2V))
def project_rect_to_ref(self, pts_3d_rect):
''' Input and Output are nx3 points '''
return np.transpose(np.dot(np.linalg.inv(self.R0), np.transpose(pts_3d_rect)))
def project_ref_to_rect(self, pts_3d_ref):
''' Input and Output are nx3 points '''
return np.transpose(np.dot(self.R0, np.transpose(pts_3d_ref)))
def project_rect_to_velo(self, pts_3d_rect):
''' Input: nx3 points in rect camera coord.
Output: nx3 points in velodyne coord.
'''
pts_3d_ref = self.project_rect_to_ref(pts_3d_rect)
return self.project_ref_to_velo(pts_3d_ref)
def project_velo_to_rect(self, pts_3d_velo):
pts_3d_ref = self.project_velo_to_ref(pts_3d_velo)
return self.project_ref_to_rect(pts_3d_ref)
def corners3d_to_img_boxes(self, corners3d):
"""
:param corners3d: (N, 8, 3) corners in rect coordinate
:return: boxes: (None, 4) [x1, y1, x2, y2] in rgb coordinate
:return: boxes_corner: (None, 8) [xi, yi] in rgb coordinate
"""
sample_num = corners3d.shape[0]
corners3d_hom = np.concatenate((corners3d, np.ones((sample_num, 8, 1))), axis=2) # (N, 8, 4)
img_pts = np.matmul(corners3d_hom, self.P.T) # (N, 8, 3)
x, y = img_pts[:, :, 0] / img_pts[:, :, 2], img_pts[:, :, 1] / img_pts[:, :, 2]
x1, y1 = np.min(x, axis=1), np.min(y, axis=1)
x2, y2 = np.max(x, axis=1), np.max(y, axis=1)
boxes = np.concatenate((x1.reshape(-1, 1), y1.reshape(-1, 1), x2.reshape(-1, 1), y2.reshape(-1, 1)), axis=1)
boxes_corner = np.concatenate((x.reshape(-1, 8, 1), y.reshape(-1, 8, 1)), axis=2)
return boxes, boxes_corner
# ===========================
# ------- 3d to 2d ----------
# ===========================
def project_rect_to_image(self, pts_3d_rect):
''' Input: nx3 points in rect camera coord.
Output: nx2 points in image2 coord.
'''
pts_3d_rect = self.cart2hom(pts_3d_rect)
pts_2d = np.dot(pts_3d_rect, np.transpose(self.P)) # nx3
pts_2d[:,0] /= pts_2d[:,2]
pts_2d[:,1] /= pts_2d[:,2]
return pts_2d[:,0:2]
def project_velo_to_image(self, pts_3d_velo):
''' Input: nx3 points in velodyne coord.
Output: nx2 points in image2 coord.
'''
pts_3d_rect = self.project_velo_to_rect(pts_3d_velo)
return self.project_rect_to_image(pts_3d_rect)
# ===========================
# ------- 2d to 3d ----------
# ===========================
def project_image_to_rect(self, uv_depth):
''' Input: nx3 first two channels are uv, 3rd channel
is depth in rect camera coord.
Output: nx3 points in rect camera coord.
'''
n = uv_depth.shape[0]
x = ((uv_depth[:,0]-self.c_u)*uv_depth[:,2])/self.f_u + self.b_x
y = ((uv_depth[:,1]-self.c_v)*uv_depth[:,2])/self.f_v + self.b_y
pts_3d_rect = np.zeros((n,3))
pts_3d_rect[:,0] = x
pts_3d_rect[:,1] = y
pts_3d_rect[:,2] = uv_depth[:,2]
return pts_3d_rect
def project_image_to_velo(self, uv_depth):
pts_3d_rect = self.project_image_to_rect(uv_depth)
return self.project_rect_to_velo(pts_3d_rect)
def rotx(t):
''' 3D Rotation about the x-axis. '''
c = np.cos(t)
s = np.sin(t)
return np.array([[1, 0, 0],
[0, c, -s],
[0, s, c]])
def roty(t):
''' Rotation about the y-axis. '''
c = np.cos(t)
s = np.sin(t)
return np.array([[c, 0, s],
[0, 1, 0],
[-s, 0, c]])
def rotz(t):
''' Rotation about the z-axis. '''
c = np.cos(t)
s = np.sin(t)
return np.array([[c, -s, 0],
[s, c, 0],
[0, 0, 1]])
def transform_from_rot_trans(R, t):
''' Transforation matrix from rotation matrix and translation vector. '''
R = R.reshape(3, 3)
t = t.reshape(3, 1)
return np.vstack((np.hstack([R, t]), [0, 0, 0, 1]))
def inverse_rigid_trans(Tr):
''' Inverse a rigid body transform matrix (3x4 as [R|t])
[R'|-R't; 0|1]
'''
inv_Tr = np.zeros_like(Tr) # 3x4
inv_Tr[0:3,0:3] = np.transpose(Tr[0:3,0:3])
inv_Tr[0:3,3] = np.dot(-np.transpose(Tr[0:3,0:3]), Tr[0:3,3])
return inv_Tr
def read_label(label_filename):
lines = [line.rstrip() for line in open(label_filename)]
objects = [Object3d(line) for line in lines]
return objects
def load_image(img_filename):
return cv2.imread(img_filename)
def load_velo_scan(velo_filename):
scan = np.fromfile(velo_filename, dtype=np.float32)
scan = scan.reshape((-1, 4))
return scan
kitti_object.py
from __future__ import print_function
import os
import sys
import cv2
import numpy as np
from PIL import Image
import matplotlib.pyplot as plt
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = os.path.dirname(BASE_DIR)
sys.path.append(os.path.join(ROOT_DIR, 'mayavi'))
import kitti_util as utils
'''
在圖像中畫2D框、3D框
'''
def show_image_with_boxes(img, objects, calib, show3d=True):
img1 = np.copy(img) # for 2d bbox
img2 = np.copy(img) # for 3d bbox
for obj in objects:
if obj.type=='DontCare':continue
cv2.rectangle(img1, (int(obj.xmin),int(obj.ymin)), (int(obj.xmax),int(obj.ymax)), (0,255,0), 2) # 畫2D框
box3d_pts_2d, box3d_pts_3d = utils.compute_box_3d(obj, calib.P) # 獲取圖像3D框(8*2)、相機(jī)坐標(biāo)系3D框(8*3)
img2 = utils.draw_projected_box3d(img2, box3d_pts_2d) # 在圖像上畫3D框
if show3d:
Image.fromarray(img2).save('save_output/image_with_3Dboxes.png')
Image.fromarray(img2).show()
else:
Image.fromarray(img1).save('save_output/image_with_2Dboxes.png')
Image.fromarray(img1).show()
'''
可視化BEV鳥瞰圖
'''
def show_lidar_topview(pc_velo, objects, calib):
# 1-設(shè)置鳥瞰圖范圍
side_range = (-30, 30) # 左右距離
fwd_range = (0, 80) # 后前距離
x_points = pc_velo[:, 0]
y_points = pc_velo[:, 1]
z_points = pc_velo[:, 2]
# 2-獲得區(qū)域內(nèi)的點(diǎn)
f_filt = np.logical_and(x_points > fwd_range[0], x_points < fwd_range[1])
s_filt = np.logical_and(y_points > side_range[0], y_points < side_range[1])
filter = np.logical_and(f_filt, s_filt)
indices = np.argwhere(filter).flatten()
x_points = x_points[indices]
y_points = y_points[indices]
z_points = z_points[indices]
# 定義了鳥瞰圖中每個像素代表的距離
res = 0.1
# 3-1將點(diǎn)云坐標(biāo)系 轉(zhuǎn)到 BEV坐標(biāo)系
x_img = (-y_points / res).astype(np.int32)
y_img = (-x_points / res).astype(np.int32)
# 3-2調(diào)整坐標(biāo)原點(diǎn)
x_img -= int(np.floor(side_range[0]) / res)
y_img += int(np.floor(fwd_range[1]) / res)
print(x_img.min(), x_img.max(), y_img.min(), y_img.max())
# 4-填充像素值, 將點(diǎn)云數(shù)據(jù)的高度信息(Z坐標(biāo))映射到像素值
height_range = (-3, 1.0)
pixel_value = np.clip(a=z_points, a_max=height_range[1], a_min=height_range[0])
def scale_to_255(a, min, max, dtype=np.uint8):
return ((a - min) / float(max - min) * 255).astype(dtype)
pixel_value = scale_to_255(pixel_value, height_range[0], height_range[1])
# 創(chuàng)建圖像數(shù)組
x_max = 1 + int((side_range[1] - side_range[0]) / res)
y_max = 1 + int((fwd_range[1] - fwd_range[0]) / res)
im = np.zeros([y_max, x_max], dtype=np.uint8)
im[y_img, x_img] = pixel_value
im2 = Image.fromarray(im)
im2.save('save_output/BEV.png')
im2.show()
'''
將點(diǎn)云數(shù)據(jù)3D框投影到BEV
'''
def show_lidar_topview_with_boxes(img, objects, calib):
def bbox3d(obj):
box3d_pts_2d, box3d_pts_3d = utils.compute_box_3d(obj, calib.P) # 獲取3D框-圖像、3D框-相機(jī)坐標(biāo)系
box3d_pts_3d_velo = calib.project_rect_to_velo(box3d_pts_3d) # 將相機(jī)坐標(biāo)系的框 轉(zhuǎn)到 激光雷達(dá)坐標(biāo)系
return box3d_pts_3d_velo # 返回nx3的點(diǎn)
boxes3d = [bbox3d(obj) for obj in objects if obj.type == "Car"]
gt = np.array(boxes3d)
im2 = utils.draw_box3d_label_on_bev(img, gt, scores=None, thickness=1) # 獲取激光雷達(dá)坐標(biāo)系的3D點(diǎn),選擇x, y兩維,畫到BEV平面坐標(biāo)系上
im2 = Image.fromarray(im2)
im2.save('save_output/BEV with boxes.png')
im2.show()
'''
將點(diǎn)云數(shù)據(jù)投影到圖像
'''
def show_lidar_on_image(pc_velo, img, calib, img_width, img_height):
''' Project LiDAR points to image '''
imgfov_pc_velo, pts_2d, fov_inds = get_lidar_in_image_fov(pc_velo,
calib, 0, 0, img_width, img_height, True)
imgfov_pts_2d = pts_2d[fov_inds,:]
imgfov_pc_rect = calib.project_velo_to_rect(imgfov_pc_velo)
import matplotlib.pyplot as plt
cmap = plt.cm.get_cmap('hsv', 256)
cmap = np.array([cmap(i) for i in range(256)])[:,:3]*255
for i in range(imgfov_pts_2d.shape[0]):
depth = imgfov_pc_rect[i,2]
color = cmap[int(640.0/depth),:]
cv2.circle(img, (int(np.round(imgfov_pts_2d[i,0])),
int(np.round(imgfov_pts_2d[i,1]))),
2, color=tuple(color), thickness=-1)
Image.fromarray(img).save('save_output/lidar_on_image.png')
Image.fromarray(img).show()
return img
'''
將點(diǎn)云數(shù)據(jù)投影到相機(jī)坐標(biāo)系
'''
def get_lidar_in_image_fov(pc_velo, calib, xmin, ymin, xmax, ymax,
return_more=False, clip_distance=2.0):
''' Filter lidar points, keep those in image FOV '''
pts_2d = calib.project_velo_to_image(pc_velo)
fov_inds = (pts_2d[:,0]<xmax) & (pts_2d[:,0]>=xmin) & \
(pts_2d[:,1]<ymax) & (pts_2d[:,1]>=ymin)
fov_inds = fov_inds & (pc_velo[:,0]>clip_distance)
imgfov_pc_velo = pc_velo[fov_inds,:]
if return_more:
return imgfov_pc_velo, pts_2d, fov_inds
else:
return imgfov_pc_velo
'''
解析標(biāo)簽
'''
class kitti_object(object):
'''Load and parse object data into a usable format.'''
def __init__(self, root_dir, split='training'):
'''root_dir contains training and testing folders'''
self.root_dir = root_dir
self.split = split
self.split_dir = os.path.join(root_dir, split)
if split == 'training':
self.num_samples = 7481
elif split == 'testing':
self.num_samples = 7518
else:
print('Unknown split: %s' % (split))
exit(-1)
self.image_dir = os.path.join(self.split_dir, 'image_2')
self.calib_dir = os.path.join(self.split_dir, 'calib')
self.lidar_dir = os.path.join(self.split_dir, 'velodyne')
self.label_dir = os.path.join(self.split_dir, 'label_2')
def __len__(self):
return self.num_samples
def get_image(self, idx):
assert(idx<self.num_samples)
img_filename = os.path.join(self.image_dir, '%06d.png'%(idx))
return utils.load_image(img_filename)
def get_lidar(self, idx):
assert(idx<self.num_samples)
lidar_filename = os.path.join(self.lidar_dir, '%06d.bin'%(idx))
return utils.load_velo_scan(lidar_filename)
def get_calibration(self, idx):
assert(idx<self.num_samples)
calib_filename = os.path.join(self.calib_dir, '%06d.txt'%(idx))
return utils.Calibration(calib_filename)
def get_label_objects(self, idx):
assert(idx<self.num_samples and self.split=='training')
label_filename = os.path.join(self.label_dir, '%06d.txt'%(idx))
return utils.read_label(label_filename)
def get_depth_map(self, idx):
pass
def get_top_down(self, idx):
pass
運(yùn)行程序后kitti_vis_main.py后,回保存5張結(jié)果圖片
文章來源:http://www.zghlxwxcb.cn/news/detail-708010.html
后面還會介紹Nuscenes、Waymo等3D數(shù)據(jù)集。文章來源地址http://www.zghlxwxcb.cn/news/detail-708010.html
到了這里,關(guān)于3D目標(biāo)檢測數(shù)據(jù)集 KITTI(標(biāo)簽格式解析、點(diǎn)云轉(zhuǎn)圖像、點(diǎn)云轉(zhuǎn)BEV)的文章就介紹完了。如果您還想了解更多內(nèi)容,請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!