目錄
一、環(huán)境配置
二、.torchscript.pt版本模型導(dǎo)出
三、C++版本yolov5.4實(shí)現(xiàn)
四、問(wèn)題記錄
4.1、注釋 detector.h中,注釋如下頭文件
4.2、錯(cuò)誤: “std”: 不明確的符號(hào)
4.3、建議常被debug版本libtorch
4.4、問(wèn)題:編譯成功后,運(yùn)行代碼,發(fā)現(xiàn)torch::cuda::is_available()返回false
4.5、導(dǎo)出模型,命令行有警告
五、reference
一、環(huán)境配置
- win10
- vs2017
- libtorch-win-shared-with-deps-debug-1.8.1+cpu
- opencv349
由于yolov5代碼,作者還在更新(寫(xiě)這篇博客的時(shí)候,最新是5.4,沒(méi)錯(cuò)這是幾年前寫(xiě)的),模型結(jié)構(gòu)可能會(huì)有改變,所以咱們使用的libtorch必須滿足其要求,最好是一致。我這里提供本博客采用的yolov5版本python源碼。
????????在源碼中的requirments.txt中要求依賴庫(kù)版本如下;在c++環(huán)境中,咱們這里用的libtorch1.8.1(今天我也測(cè)試了環(huán)境:libtorch-win-shared-with-deps-1.7.1+cu110,也能夠正常檢測(cè),和本博客最終結(jié)果一致);同時(shí)用opencv&c++作圖像處理,不需要c++版本torchvision:
# pip install -r requirements.txt
# base ----------------------------------------
matplotlib>=3.2.2
numpy>=1.18.5
opencv-python>=4.1.2
Pillow
PyYAML>=5.3.1
scipy>=1.4.1
torch>=1.7.0
torchvision>=0.8.1# 以下內(nèi)容神略
????????為了便于調(diào)試,我這里下載的是debug版本libtorch,而且是cpu版本,代碼調(diào)好后,轉(zhuǎn)GPU也很簡(jiǎn)單吧。opencv版本其實(shí)隨意,opencv3++就行。
二、.torchscript.pt版本模型導(dǎo)出
打開(kāi)yolov5.4源碼目錄下models文件夾,編輯export.py腳本,如下,將58行注釋?zhuān)略?9行(GPU版本還需要修改一些內(nèi)容,GPU版本后續(xù)更新,這篇博客只管CPU版本)
"""Exports a YOLOv5 *.pt model to ONNX and TorchScript formats
Usage:
$ export PYTHONPATH="$PWD" && python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
"""
import argparse
import sys
import time
sys.path.append('./') # to run '$ python *.py' files in subdirectories
import torch
import torch.nn as nn
import models
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import set_logging, check_img_size
from utils.torch_utils import select_device
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--weights', type=str, default='./yolov5s.pt', help='weights path') # from yolov5/models/
parser.add_argument('--img-size', nargs='+', type=int, default=[640, 640], help='image size') # height, width
parser.add_argument('--batch-size', type=int, default=1, help='batch size')
parser.add_argument('--dynamic', action='store_true', help='dynamic ONNX axes')
parser.add_argument('--grid', action='store_true', help='export Detect() layer grid')
parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
opt = parser.parse_args()
opt.img_size *= 2 if len(opt.img_size) == 1 else 1 # expand
print(opt)
set_logging()
t = time.time()
# Load PyTorch model
device = select_device(opt.device)
model = attempt_load(opt.weights, map_location=device) # load FP32 model
labels = model.names
# Checks
gs = int(max(model.stride)) # grid size (max stride)
opt.img_size = [check_img_size(x, gs) for x in opt.img_size] # verify img_size are gs-multiples
# Input
img = torch.zeros(opt.batch_size, 3, *opt.img_size).to(device) # image size(1,3,320,192) iDetection
# Update model
for k, m in model.named_modules():
m._non_persistent_buffers_set = set() # pytorch 1.6.0 compatibility
if isinstance(m, models.common.Conv): # assign export-friendly activations
if isinstance(m.act, nn.Hardswish):
m.act = Hardswish()
elif isinstance(m.act, nn.SiLU):
m.act = SiLU()
# elif isinstance(m, models.yolo.Detect):
# m.forward = m.forward_export # assign forward (optional)
#model.model[-1].export = not opt.grid # set Detect() layer grid export
model.model[-1].export = False
y = model(img) # dry run
# TorchScript export
try:
print('\nStarting TorchScript export with torch %s...' % torch.__version__)
f = opt.weights.replace('.pt', '.torchscript.pt') # filename
ts = torch.jit.trace(model, img)
ts.save(f)
print('TorchScript export success, saved as %s' % f)
except Exception as e:
print('TorchScript export failure: %s' % e)
# 以下代碼省略,無(wú)需求改
......
接著在conda環(huán)境激活yolov5.4的虛擬環(huán)境,執(zhí)行下面腳本:
python models/export.py --weights ./weights/yolov5s.pt --img 640 --batch 1
錯(cuò)誤解決:1、bash窗口可能提示 not module utils;這是因?yàn)闆](méi)有將源碼根目錄添加進(jìn)環(huán)境變量,linux下執(zhí)行以下命令就行
export PYTHONPATH="$PWD"
????????win下,我建議直接用pycharm打開(kāi)yolov5.4工程,在ide中去執(zhí)行export.py就行,如果你沒(méi)有下載好yolovs.pt,他會(huì)自動(dòng)下載,下載鏈接會(huì)打印在控制臺(tái),如下,如果下不動(dòng),可以嘗試復(fù)制鏈接到迅雷。
Downloading https://github.com/ultralytics/yolov5/releases/download/v4.0/yolov5s.pt to yolov5s.pt...
執(zhí)行export.py后出現(xiàn)如下警告:
D:\yolov5-0327\models\yolo.py:50: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
if self.grid[i].shape[2:4] != x[i].shape[2:4]:
D:\Program Files\Anaconda\envs\yolov5\lib\site-packages\torch\jit\_trace.py:934: TracerWarning: Encountering a list at the output of the tracer might cause the trace to be incorrect, this is only valid if the container structure does not change based on the module's inputs. Consider using a constant container instead (e.g. for `list`, use a `tuple` instead. for `dict`, use a `NamedTuple` instead). If you absolutely need this and know the side effects, pass strict=False to trace() to allow this behavior.
module._c._create_method_from_trace(
TorchScript export success, saved as ./yolov5s.torchscript.pt
ONNX export failure: No module named 'onnx'
CoreML export failure: No module named 'coremltools'
Export complete (10.94s). Visualize with https://github.com/lutzroeder/netron.
警告內(nèi)容以后分析,不影響部署
三、C++版本yolov5.4實(shí)現(xiàn)
libtorch在vs環(huán)境中配置(在項(xiàng)目屬性中設(shè)置下面加粗項(xiàng)目):
include:
D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\include
D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\include\torch\csrc\api\include
lib:
D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\lib
依賴庫(kù)(你可能用的更新的libtorch,所以具體lib目錄下所有.lib文件都要自己貼到連接器-附加依賴中):
asmjit.lib
c10.lib
c10d.lib
c10_cuda.lib
caffe2_detectron_ops_gpu.lib
caffe2_module_test_dynamic.lib
caffe2_nvrtc.lib
clog.lib
cpuinfo.lib
dnnl.lib
fbgemm.lib
fbjni.lib
gloo.lib
gloo_cuda.lib
libprotobuf-lite.lib
libprotobuf.lib
libprotoc.lib
mkldnn.lib
pthreadpool.lib
pytorch_jni.lib
torch.lib
torch_cpu.lib
torch_cuda.lib
XNNPACK.lib
環(huán)境變量(需要重啟):
D:\libtorch-win-shared-with-deps-debug-1.8.1+cpu\libtorch\lib
配置好之后,vs2017 設(shè)置為debug X64模式,下面是yolov5.4版本c++代碼
輸入是:
- 上述轉(zhuǎn)好的.torchscript.pt格式的模型文件
- coco.names
- 一張圖
完整libtorch c++源碼:
#include <torch/script.h>
#include <torch/torch.h>
#include<opencv2/opencv.hpp>
#include <iostream>
std::vector<std::string> LoadNames(const std::string& path)
{
// load class names
std::vector<std::string> class_names;
std::ifstream infile(path);
if (infile.is_open()) {
std::string line;
while (std::getline(infile, line)) {
class_names.emplace_back(line);
}
infile.close();
}
else {
std::cerr << "Error loading the class names!\n";
}
return class_names;
}
std::vector<float> LetterboxImage(const cv::Mat& src, cv::Mat& dst, const cv::Size& out_size)
{
auto in_h = static_cast<float>(src.rows);
auto in_w = static_cast<float>(src.cols);
float out_h = out_size.height;
float out_w = out_size.width;
float scale = std::min(out_w / in_w, out_h / in_h);
int mid_h = static_cast<int>(in_h * scale);
int mid_w = static_cast<int>(in_w * scale);
cv::resize(src, dst, cv::Size(mid_w, mid_h));
int top = (static_cast<int>(out_h) - mid_h) / 2;
int down = (static_cast<int>(out_h) - mid_h + 1) / 2;
int left = (static_cast<int>(out_w) - mid_w) / 2;
int right = (static_cast<int>(out_w) - mid_w + 1) / 2;
cv::copyMakeBorder(dst, dst, top, down, left, right, cv::BORDER_CONSTANT, cv::Scalar(114, 114, 114));
std::vector<float> pad_info{ static_cast<float>(left), static_cast<float>(top), scale };
return pad_info;
}
enum Det
{
tl_x = 0,
tl_y = 1,
br_x = 2,
br_y = 3,
score = 4,
class_idx = 5
};
struct Detection
{
cv::Rect bbox;
float score;
int class_idx;
};
void Tensor2Detection(const at::TensorAccessor<float, 2>& offset_boxes,
const at::TensorAccessor<float, 2>& det,
std::vector<cv::Rect>& offset_box_vec,
std::vector<float>& score_vec)
{
for (int i = 0; i < offset_boxes.size(0); i++) {
offset_box_vec.emplace_back(
cv::Rect(cv::Point(offset_boxes[i][Det::tl_x], offset_boxes[i][Det::tl_y]),
cv::Point(offset_boxes[i][Det::br_x], offset_boxes[i][Det::br_y]))
);
score_vec.emplace_back(det[i][Det::score]);
}
}
void ScaleCoordinates(std::vector<Detection>& data, float pad_w, float pad_h,
float scale, const cv::Size& img_shape)
{
auto clip = [](float n, float lower, float upper)
{
return std::max(lower, std::min(n, upper));
};
std::vector<Detection> detections;
for (auto & i : data) {
float x1 = (i.bbox.tl().x - pad_w) / scale; // x padding
float y1 = (i.bbox.tl().y - pad_h) / scale; // y padding
float x2 = (i.bbox.br().x - pad_w) / scale; // x padding
float y2 = (i.bbox.br().y - pad_h) / scale; // y padding
x1 = clip(x1, 0, img_shape.width);
y1 = clip(y1, 0, img_shape.height);
x2 = clip(x2, 0, img_shape.width);
y2 = clip(y2, 0, img_shape.height);
i.bbox = cv::Rect(cv::Point(x1, y1), cv::Point(x2, y2));
}
}
torch::Tensor xywh2xyxy(const torch::Tensor& x)
{
auto y = torch::zeros_like(x);
// convert bounding box format from (center x, center y, width, height) to (x1, y1, x2, y2)
y.select(1, Det::tl_x) = x.select(1, 0) - x.select(1, 2).div(2);
y.select(1, Det::tl_y) = x.select(1, 1) - x.select(1, 3).div(2);
y.select(1, Det::br_x) = x.select(1, 0) + x.select(1, 2).div(2);
y.select(1, Det::br_y) = x.select(1, 1) + x.select(1, 3).div(2);
return y;
}
std::vector<std::vector<Detection>> PostProcessing(const torch::Tensor& detections,
float pad_w, float pad_h, float scale, const cv::Size& img_shape,
float conf_thres, float iou_thres)
{
/***
* 結(jié)果緯度為batch index(0), top-left x/y (1,2), bottom-right x/y (3,4), score(5), class id(6)
* 13*13*3*(1+4)*80
*/
constexpr int item_attr_size = 5;
int batch_size = detections.size(0);
// number of classes, e.g. 80 for coco dataset
auto num_classes = detections.size(2) - item_attr_size;
// get candidates which object confidence > threshold
auto conf_mask = detections.select(2, 4).ge(conf_thres).unsqueeze(2);
std::vector<std::vector<Detection>> output;
output.reserve(batch_size);
// iterating all images in the batch
for (int batch_i = 0; batch_i < batch_size; batch_i++) {
// apply constrains to get filtered detections for current image
auto det = torch::masked_select(detections[batch_i], conf_mask[batch_i]).view({ -1, num_classes + item_attr_size });
// if none detections remain then skip and start to process next image
if (0 == det.size(0)) {
continue;
}
// compute overall score = obj_conf * cls_conf, similar to x[:, 5:] *= x[:, 4:5]
det.slice(1, item_attr_size, item_attr_size + num_classes) *= det.select(1, 4).unsqueeze(1);
// box (center x, center y, width, height) to (x1, y1, x2, y2)
torch::Tensor box = xywh2xyxy(det.slice(1, 0, 4));
// [best class only] get the max classes score at each result (e.g. elements 5-84)
std::tuple<torch::Tensor, torch::Tensor> max_classes = torch::max(det.slice(1, item_attr_size, item_attr_size + num_classes), 1);
// class score
auto max_conf_score = std::get<0>(max_classes);
// index
auto max_conf_index = std::get<1>(max_classes);
max_conf_score = max_conf_score.to(torch::kFloat).unsqueeze(1);
max_conf_index = max_conf_index.to(torch::kFloat).unsqueeze(1);
// shape: n * 6, top-left x/y (0,1), bottom-right x/y (2,3), score(4), class index(5)
det = torch::cat({ box.slice(1, 0, 4), max_conf_score, max_conf_index }, 1);
// for batched NMS
constexpr int max_wh = 4096;
auto c = det.slice(1, item_attr_size, item_attr_size + 1) * max_wh;
auto offset_box = det.slice(1, 0, 4) + c;
std::vector<cv::Rect> offset_box_vec;
std::vector<float> score_vec;
// copy data back to cpu
auto offset_boxes_cpu = offset_box.cpu();
auto det_cpu = det.cpu();
const auto& det_cpu_array = det_cpu.accessor<float, 2>();
// use accessor to access tensor elements efficiently
Tensor2Detection(offset_boxes_cpu.accessor<float, 2>(), det_cpu_array, offset_box_vec, score_vec);
// run NMS
std::vector<int> nms_indices;
cv::dnn::NMSBoxes(offset_box_vec, score_vec, conf_thres, iou_thres, nms_indices);
std::vector<Detection> det_vec;
for (int index : nms_indices) {
Detection t;
const auto& b = det_cpu_array[index];
t.bbox =
cv::Rect(cv::Point(b[Det::tl_x], b[Det::tl_y]),
cv::Point(b[Det::br_x], b[Det::br_y]));
t.score = det_cpu_array[index][Det::score];
t.class_idx = det_cpu_array[index][Det::class_idx];
det_vec.emplace_back(t);
}
ScaleCoordinates(det_vec, pad_w, pad_h, scale, img_shape);
// save final detection for the current image
output.emplace_back(det_vec);
} // end of batch iterating
return output;
}
void Demo(cv::Mat& img,
const std::vector<std::vector<Detection>>& detections,
const std::vector<std::string>& class_names,
bool label = true)
{
if (!detections.empty()) {
for (const auto& detection : detections[0]) {
const auto& box = detection.bbox;
float score = detection.score;
int class_idx = detection.class_idx;
cv::rectangle(img, box, cv::Scalar(0, 0, 255), 2);
if (label) {
std::stringstream ss;
ss << std::fixed << std::setprecision(2) << score;
std::string s = class_names[class_idx] + " " + ss.str();
auto font_face = cv::FONT_HERSHEY_DUPLEX;
auto font_scale = 1.0;
int thickness = 1;
int baseline = 0;
auto s_size = cv::getTextSize(s, font_face, font_scale, thickness, &baseline);
cv::rectangle(img,
cv::Point(box.tl().x, box.tl().y - s_size.height - 5),
cv::Point(box.tl().x + s_size.width, box.tl().y),
cv::Scalar(0, 0, 255), -1);
cv::putText(img, s, cv::Point(box.tl().x, box.tl().y - 5),
font_face, font_scale, cv::Scalar(255, 255, 255), thickness);
}
}
}
cv::namedWindow("Result", cv::WINDOW_NORMAL);
cv::imshow("Result", img);
}
int main()
{
// yolov5Ns.torchscript.pt 報(bào)錯(cuò),所以僅能讀取yolov5.4模型
torch::jit::script::Module module = torch::jit::load("yolov5sxxx.torchscript.pt");
torch::DeviceType device_type = torch::kCPU;
module.to(device_type);
/*module.to(torch::kHalf);*/
module.eval();
// img 必須讀取3-channels圖片
cv::Mat img = cv::imread("zidane.jpg", -1);
// 讀取類(lèi)別
std::vector<std::string> class_names = LoadNames("coco.names");
if (class_names.empty()) {
return -1;
}
// set up threshold
float conf_thres = 0.4;
float iou_thres = 0.5;
//inference
torch::NoGradGuard no_grad;
cv::Mat img_input = img.clone();
std::vector<float> pad_info = LetterboxImage(img_input, img_input, cv::Size(640, 640));
const float pad_w = pad_info[0];
const float pad_h = pad_info[1];
const float scale = pad_info[2];
cv::cvtColor(img_input, img_input, cv::COLOR_BGR2RGB); // BGR -> RGB
//歸一化需要是浮點(diǎn)類(lèi)型
img_input.convertTo(img_input, CV_32FC3, 1.0f / 255.0f); // normalization 1/255
// 加載圖像到設(shè)備
auto tensor_img = torch::from_blob(img_input.data, { 1, img_input.rows, img_input.cols, img_input.channels() }).to(device_type);
// BHWC -> BCHW
tensor_img = tensor_img.permute({ 0, 3, 1, 2 }).contiguous(); // BHWC -> BCHW (Batch, Channel, Height, Width)
std::vector<torch::jit::IValue> inputs;
// 在容器尾部添加一個(gè)元素,這個(gè)元素原地構(gòu)造,不需要觸發(fā)拷貝構(gòu)造和轉(zhuǎn)移構(gòu)造
inputs.emplace_back(tensor_img);
torch::jit::IValue output = module.forward(inputs);
// 解析結(jié)果
auto detections = output.toTuple()->elements()[0].toTensor();
auto result = PostProcessing(detections, pad_w, pad_h, scale, img.size(), conf_thres, iou_thres);
// visualize detections
if (true) {
Demo(img, result, class_names);
cv::waitKey(0);
}
return 1;
}
四、問(wèn)題記錄
我參考的是鏈接[1][2]代碼,非???,[1][2]代碼是一樣的,也不知道誰(shuí)抄誰(shuí)的,代碼中沒(méi)有說(shuō)明yolov5具體版本,而且有很多問(wèn)題,不過(guò)還是感謝給了參考。
4.1、注釋 detector.h中,注釋如下頭文件
//#include <c10/cuda/CUDAStream.h>
#//include <ATen/cuda/CUDAEvent.h>
4.2、錯(cuò)誤: “std”: 不明確的符號(hào)
解決辦法1:項(xiàng)目->屬性->c/c++->語(yǔ)言->符合模式->選擇否
(看清楚vs項(xiàng)目屬性窗口對(duì)應(yīng)的到底是Debug還是Release,血的教訓(xùn)?。?/p>
解決辦法2:還有有個(gè)老哥給出的方法是,在std報(bào)錯(cuò)的地方改為:"::std",不推薦!
4.3、建議常被debug版本libtorch
libtorch中,執(zhí)行到加載模型那一行代碼,跳進(jìn)libtorch庫(kù)中的Assert,提示錯(cuò)誤:AT_ASSERT(isTuple(), "Expected Tuple but got ", tagKind());(咱們是libtorch debug版本,還能跳到這一行,要是release,你都不知道錯(cuò)在哪里,所以常備debug版本,很有必要)
可能是你轉(zhuǎn)模型的yolov5版本不是5.4,而是5.3、5.3.1、5.3、5.1;還有可能是你export.py腳本中沒(méi)有按照上面設(shè)置。
參考:https://blog.csdn.net/weixin_42398658/article/details/111954760
4.4、問(wèn)題:編譯成功后,運(yùn)行代碼,發(fā)現(xiàn)torch::cuda::is_available()返回false
解決:
- a、配置環(huán)境的時(shí)候,請(qǐng)將庫(kù)lib文件夾下所有“.lib”文件名粘貼到項(xiàng)目屬性(Release)-鏈接器 - 輸入 - 附加依賴項(xiàng)
- b、在項(xiàng)目屬性(Release)-鏈接器 - 命令行 - 其他選項(xiàng)貼入下面命令
/INCLUDE:?warp_size@cuda@at@@YAHXZ
完美解決!
4.5、導(dǎo)出模型,命令行有警告
上面導(dǎo)出模型控制臺(tái)打印的警告信息還沒(méi)解決,但是部署后,檢測(cè)效果和python版本有差別(其實(shí)幾乎差不多),如下:
如下:左邊是官方結(jié)果,右邊是libtorch模型部署結(jié)構(gòu),置信度不相上下,開(kāi)心!
可以看到右邊那個(gè)人領(lǐng)帶沒(méi)有檢測(cè)出來(lái),這是因?yàn)樵蹅冇玫氖?s模型,在yolov5最新版本中,作者對(duì)模型的修改更加注重5x模型的精度,5s性能確實(shí)略微下降。
?
五、reference
[1] libtorch代碼參考;https://zhuanlan.zhihu.com/p/338167520
[2] libtorch代碼參考;https://gitee.com/goodtn/libtorch-yolov5-gpu/tree/master文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-735769.html
[3] libtorch相關(guān)報(bào)錯(cuò)總結(jié)(非常nice!):https://blog.csdn.net/qq_18305555/article/details/114013236文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-735769.html
到了這里,關(guān)于LibTorch實(shí)戰(zhàn)三:C++版本YOLOV5.4的部署的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!