C++ 結(jié)合 TensorRT 部署深度學(xué)習(xí)模型有幾個(gè)關(guān)鍵優(yōu)勢(shì),這些優(yōu)勢(shì)在各種工業(yè)和商業(yè)應(yīng)用中極其重要:
-
高效的性能:TensorRT 通過(guò)優(yōu)化深度學(xué)習(xí)模型來(lái)提高推理速度,減少延遲。這對(duì)于實(shí)時(shí)處理應(yīng)用(如視頻分析、機(jī)器人導(dǎo)航等)至關(guān)重要。
-
降低資源消耗:TensorRT 優(yōu)化了模型以在GPU上高效運(yùn)行,這意味著更低的內(nèi)存占用和更高的吞吐量。對(duì)于資源受限的環(huán)境或在多任務(wù)并行處理的情況下,這是一個(gè)顯著優(yōu)勢(shì)。
-
跨平臺(tái)和硬件兼容性:C++ 是一種跨平臺(tái)語(yǔ)言,配合 TensorRT,可以在多種硬件和操作系統(tǒng)上部署深度學(xué)習(xí)模型,包括嵌入式設(shè)備和服務(wù)器。
-
準(zhǔn)確性和穩(wěn)定性:TensorRT 提供了精確的數(shù)學(xué)和統(tǒng)計(jì)方法來(lái)減少浮點(diǎn)運(yùn)算誤差,這對(duì)于確保深度學(xué)習(xí)應(yīng)用的準(zhǔn)確性和穩(wěn)定性至關(guān)重要。
-
定制和靈活性:使用 C++ 和 TensorRT,開發(fā)者可以高度定制他們的深度學(xué)習(xí)應(yīng)用。這包括調(diào)整模型結(jié)構(gòu)、優(yōu)化算法和調(diào)節(jié)性能參數(shù)以滿足特定需求。
-
支持復(fù)雜網(wǎng)絡(luò)和大規(guī)模部署:TensorRT 支持最新的深度學(xué)習(xí)網(wǎng)絡(luò)結(jié)構(gòu),并能夠處理復(fù)雜的計(jì)算任務(wù)。這對(duì)于需要部署大型、復(fù)雜網(wǎng)絡(luò)的工業(yè)應(yīng)用來(lái)說(shuō)是必要的。
-
易于集成和擴(kuò)展:C++ 提供了與其他系統(tǒng)和工具(如數(shù)據(jù)庫(kù)、網(wǎng)絡(luò)服務(wù)等)集成的靈活性。此外,TensorRT 也可以輕松與其他NVIDIA工具鏈(如CUDA、cuDNN等)集成。
?
一、準(zhǔn)備
下載YOLOv8項(xiàng)目和Tensorrt部署項(xiàng)目,TensorRT C++代碼選擇:
https://github.com/xiaocao-tian/yolov8_tensorrt
yolov8參考前幾天的ultralytics。
在ultralytics新建weights文件夾,放入yolov8s.pt.
將src的gen_wts.py,復(fù)制到ultralytics。
運(yùn)行g(shù)en_wts.py,生成?yolov8s.wts.
?
?
再將weights復(fù)制到?yolov8 TensorRT。
?
二、環(huán)境準(zhǔn)備?
1.vs配置
我下載的是vs2022,只安裝c++的桌面開發(fā)。
踩坑1:特別注意,請(qǐng)先安裝Visual Studio 2019,再安裝CUDA。這樣做的目的是避免在Visual Studio 2019中看不到CUDA runtime模板。CUDA安裝過(guò)程中,會(huì)提供cuda模板插件,如果先下載好Visual Studio 2019的情況下,該插件會(huì)自動(dòng)配置。
平坑1:安裝好vs2022后,再重裝cuda。
cuda和cudnn安裝請(qǐng)看:yolov8實(shí)戰(zhàn)第一天——yolov8部署并訓(xùn)練自己的數(shù)據(jù)集(保姆式教程)_yolov8訓(xùn)練自己的數(shù)據(jù)集-CSDN博客
2.cmake配置
Index of /files
下載:cmake-3.28.0-rc1-windows-x86_64.msi?
安裝版本,自己添加環(huán)境變量。
踩坑2:要驗(yàn)證cmake安裝是否成功。?
?
cmake成功安裝。
3.opencv、tensorrt配置
opencv安裝:C++實(shí)戰(zhàn)Opencv第一天——win11下配置vs,opencv環(huán)境和運(yùn)行第一個(gè)c++代碼(從零開始,保姆教學(xué))-CSDN博客
tensorrt安裝:
yolov8實(shí)戰(zhàn)第三天——yolov8TensorRT部署(python推理)(保姆教學(xué))_yolov8 tensorrt python部署-CSDN博客
踩坑3:環(huán)境變量的配置
平坑3:opencv、tensorrt、cudnn環(huán)境變量配置
至此,vs,cmake,cuda,cudnn,opencv,tensorrt全部配置完成。?
三、編譯
在tensorrt項(xiàng)目中新建build文件夾,然后使用cmake編譯,注意tensorrt項(xiàng)目中Cmakelist.txt?
分別配置自己opencv和tensorrt的地址即可。
cmake_minimum_required(VERSION 3.10)
project(yolov8)
# Modify to your path
set(OpenCV_DIR "E:/opencv/opencv/build")
set(TRT_DIR "E:/TensorRT-8.6.1.6")
add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Debug)
# setup CUDA
find_package(CUDA REQUIRED)
message(STATUS "libraries: ${CUDA_LIBRARIES}")
message(STATUS "include path: ${CUDA_INCLUDE_DIRS}")
include_directories(${CUDA_INCLUDE_DIRS})
enable_language(CUDA)
include_directories(${PROJECT_SOURCE_DIR}/include)
include_directories(${PROJECT_SOURCE_DIR}/plugin)
# TensorRT
set(TENSORRT_ROOT "E:/TensorRT-8.6.1.6")
include_directories("${TENSORRT_ROOT}/include")
link_directories("${TENSORRT_ROOT}/lib")
# OpenCV
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
add_library(myplugins SHARED ${PROJECT_SOURCE_DIR}/plugin/yololayer.cu)
target_link_libraries(myplugins nvinfer cudart)
file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp ${PROJECT_SOURCE_DIR}/src/*.cu)
add_executable(yolov8 ${PROJECT_SOURCE_DIR}/src/main.cpp ${SRCS})
target_link_libraries(yolov8 nvinfer)
target_link_libraries(yolov8 cudart)
target_link_libraries(yolov8 myplugins)
target_link_libraries(yolov8 ${OpenCV_LIBS})
在tensorrt項(xiàng)目中新建build文件夾,然后使用cmake編譯,填寫如圖。
?
?
?踩坑1:No CUDA toolset found.就是找不到cuda。
The C compiler identification is MSVC 19.38.33133.0
The CXX compiler identification is MSVC 19.38.33133.0
Detecting C compiler ABI info
Detecting C compiler ABI info - done
Check for working C compiler: E:/vs2022/Community/VC/Tools/MSVC/14.38.33130/bin/Hostx64/x64/cl.exe - skipped
Detecting C compile features
Detecting C compile features - done
Detecting CXX compiler ABI info
Detecting CXX compiler ABI info - done
Check for working CXX compiler: E:/vs2022/Community/VC/Tools/MSVC/14.38.33130/bin/Hostx64/x64/cl.exe - skipped
Detecting CXX compile features
Detecting CXX compile features - done
CMake Warning (dev) at CMakeLists.txt:15 (find_package):
Policy CMP0146 is not set: The FindCUDA module is removed. Run "cmake
--help-policy CMP0146" for policy details. Use the cmake_policy command to
set the policy and suppress this warning.
This warning is for project developers. Use -Wno-dev to suppress it.
Found CUDA: D:/CUDA (found version "12.0")
libraries: D:/CUDA/lib/x64/cudart_static.lib
include path: D:/CUDA/include
CMake Error at D:/cmake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:529 (message):
No CUDA toolset found.
Call Stack (most recent call first):
D:/cmake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:8 (CMAKE_DETERMINE_COMPILER_ID_BUILD)
D:/cmake/share/cmake-3.28/Modules/CMakeDetermineCompilerId.cmake:53 (__determine_compiler_id_test)
D:/cmake/share/cmake-3.28/Modules/CMakeDetermineCUDACompiler.cmake:135 (CMAKE_DETERMINE_COMPILER_ID)
CMakeLists.txt:20 (enable_language)
Configuring incomplete, errors occurred!
踩坑3:找不到cudnn。
User
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "E:\Anaconda3\Lib\site-packages\tensorrt\__init__.py", line 127, in <module>
ctypes.CDLL(find_lib(lib))
^^^^^^^^^^^^^
File "E:\Anaconda3\Lib\site-packages\tensorrt\__init__.py", line 81, in find_lib
raise FileNotFoundError(
FileNotFoundError: Could not find: cudnn64_8.dll. Is it on your PATH?
Note: Paths searched were:
平坑后:警告不用管。configure:
?generate:
然后open Project。
踩坑4:cmake 點(diǎn) open Project 沒(méi)反應(yīng) 。
平坑4:在生成的build中找到y(tǒng)olov8.sln,右鍵打開方式選擇vs2022.
?
解決方案右鍵屬性->選擇yolov8.?
?
打開main.cpp
先注釋 表示生成.engine文件。
//wts_name = "";
注釋后直接運(yùn)行。?
?
?去掉注釋,再次執(zhí)行。
wts_name = "";
?視頻太短,長(zhǎng)視頻fps在100左右。文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-849583.html
添加fps代碼:文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-849583.html
while (char(cv::waitKey(1) != 27)) {
cap >> image;
if (image.empty()) {
std::cerr << "Error: Image not loaded or end of video." << std::endl;
break; // or continue based on your logic
}
auto t_beg = std::chrono::high_resolution_clock::now();
float scale = 1.0;
int img_size = image.cols * image.rows * 3;
cudaMemcpyAsync(image_device, image.data, img_size, cudaMemcpyHostToDevice, stream);
preprocess(image_device, image.cols, image.rows, device_buffers[0], kInputW, kInputH, stream, scale);
context->enqueue(kBatchSize, (void**)device_buffers, stream, nullptr);
cudaMemcpyAsync(output_buffer_host, device_buffers[1], kBatchSize * kOutputSize * sizeof(float), cudaMemcpyDeviceToHost, stream);
cudaStreamSynchronize(stream);
std::vector<Detection> res;
NMS(res, output_buffer_host, kConfThresh, kNmsThresh);
// 計(jì)算FPS
frame_counter++;
if (frame_counter % 10 == 0) { // 每10幀更新一次FPS
auto t2 = std::chrono::high_resolution_clock::now();
auto time_span = std::chrono::duration_cast<std::chrono::duration<double>>(t2 - t1);
fps = frame_counter / time_span.count();
t1 = t2;
frame_counter = 0;
}
drawBbox(image, res, scale, labels);
// 將FPS繪制到圖像上
cv::putText(image, "FPS: " + std::to_string(fps), cv::Point(10, 30), cv::FONT_HERSHEY_SIMPLEX, 1, cv::Scalar(0, 255, 0), 2);
auto t_end = std::chrono::high_resolution_clock::now();
cv::imshow("Inference", image);
float total_inf = std::chrono::duration<float, std::milli>(t_end - t_beg).count();
std::cout << "Inference time: " << int(total_inf) << std::endl;
}
// cv::waitKey();
cv::destroyAllWindows();
到了這里,關(guān)于yolov8實(shí)戰(zhàn)第六天——yolov8 TensorRT C++ 部署——(踩坑,平坑,保姆教程)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!