? ? ? ? 一、準(zhǔn)備工具
??二、燒錄
????????三、搭配環(huán)境
????????四、試跑Yolov5
????????五、tensorRT部署yolov5
前言:
在工作或?qū)W習(xí)中我們需要進(jìn)行部署,下面這篇文章是我親自部署jetson nano之后做出的總結(jié),包括自己遇到一些報(bào)錯和踩坑,希望對你們有所幫助 :?)
一、準(zhǔn)備工具
- 讀卡器
- SD卡?
- 小螺絲刀
- 網(wǎng)線(更改語言需要網(wǎng)絡(luò))
二、燒錄
燒錄鏡像就是要把SD卡里的東西給完全清除,好比我們電腦重裝系統(tǒng)一樣,把SD卡格式化。
?插上讀卡器后會自動識別U盤,我的電腦會識別很多,彈出很多個(gè)U盤選項(xiàng),這個(gè)是正常現(xiàn)象,只格式化一個(gè)就可以了。
1. 在本地的電腦上下載燒錄的鏡像,可以去官網(wǎng)自行下載,下載的時(shí)候需要看好版本,不然會出問題,我下載的版本是4.4.1,下載的網(wǎng)址如下:
??JetPack SDK 4.4.1 archive | NVIDIA Developer
2. 下載好之后進(jìn)行解壓:
3.?然后我們需要下載燒錄SD卡的工具,網(wǎng)址如下:
Get Started With Jetson Nano Developer Kit | NVIDIA Developer
下載完之后是一個(gè).exe可執(zhí)行文件,運(yùn)行安裝就可以了。
4. 開始燒錄鏡像
運(yùn)行上面下載好的Etcher之后是這樣的,選擇完鏡像之后會自動識別SD卡,讓后點(diǎn)擊flash
?這個(gè)是正在燒錄的界面,會自動燒錄2遍(大約20分鐘左右)
5.燒錄完成,彈出讀卡器,插卡開機(jī)。
6.開機(jī)之后,簡單設(shè)置一下(密碼、地區(qū)、時(shí)間),然后會就是和我們本地電腦Ubuntu界面差不多。
三、搭配環(huán)境
1.??配置CUDA
首先打開終端(ctrl+alt+t),輸入以下命令:
sudo gedit ~/.bashrc
輸入密碼后進(jìn)入文檔:
鼠標(biāo)滾輪滾到文檔最下面,輸入下面命令,然后按ctrl+s保存:
export CUDA_HOME=/usr/local/cuda-10.2
export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATH
export PATH=/usr/local/cuda-10.2/bin:$PATH
驗(yàn)證CUDA是否安裝配置成功 :
nvcc -V
出現(xiàn)這個(gè)就證明配置成功了。
?*報(bào)錯*
我輸出之后會出現(xiàn)讀取異常,或者不允許保存的提示,然后運(yùn)行nvcc -V的時(shí)候顯示命令不可執(zhí)行,這個(gè)多輸入幾次或者關(guān)閉終端再輸入幾次命令
2.? 配置conda
jetson nanoB01的架構(gòu)是aarch64,與windows和liunx不同不同,所以不能安裝Anaconda,可以安裝一個(gè)替代它的archiconda。
在終端輸入以下下載命令:
wget https://github.com/Archiconda/build-tools/releases/download/0.2.3/Archiconda3-0.2.3-Linux-aarch64.sh
下載成功。?
* 報(bào)錯*
遇到下載一半或者下載到99%之后報(bào)錯的問題。
這個(gè)問題是因?yàn)槎褩P〉脑?,改一下堆棧大小就可以了?/p>
先查一以下自己的系統(tǒng)堆棧大小,用這個(gè)命令:
ulimit -a
stack size是堆棧的大小,我原來是8192,需要改成102400
?
直接用代碼 ulimit -s 102400 修改,改完之后:
下載好之后運(yùn)行這個(gè)命令:
bash Archiconda3-0.2.3-Linux-aarch64.sh
接下來就是傻瓜式的安裝:
安裝完之后配置環(huán)境變量:?
sudo gedit ~/.bashrc
在最初打開的文檔里的最后一行加上這個(gè)命令:?
export PATH=~/archiconda3/bin:$PATH
查看conda的版本號:
conda -V
3. 創(chuàng)建你自己的虛擬環(huán)境:
conda create -n xxx(虛擬環(huán)境名) python=3.6 #創(chuàng)建一個(gè)python3.6的虛擬環(huán)境
conda activate xxx #進(jìn)入虛擬環(huán)境
conda deactivate #(退出虛擬環(huán)境)
4. 換源
首先需要備份一下sources.list文件,執(zhí)行后終端沒有響應(yīng)
sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
進(jìn)入sources.list內(nèi)部
sudo gedit /etc/apt/sources.list
進(jìn)入之后,ctrl+a刪除所有內(nèi)容,之后將以下內(nèi)容復(fù)制進(jìn)去保存退出。
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main multiverse restricted universe
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main multiverse restricted universe
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main multiverse restricted universe
deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-security main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-updates main multiverse restricted universe
deb-src http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ bionic-backports main multiverse restricted universe
?然后更新軟件列表,保存到本地:
sudo apt-get update
?更新軟件:
sudo apt-get upgrade
升級所有的安裝包,并解決依賴關(guān)系
sudo apt-get dist-upgrade
5. 安裝pip
sudo apt-get install python3-pip libopenblas-base libopenmpi-dev
更新pip到最新版本
pip3 install --upgrade pip #如果pip已是最新,可不執(zhí)行
?6. 下載torch和torchvision
nano上安裝需要去英偉達(dá)官網(wǎng)下載所需版本,我這里是1.8.0,網(wǎng)址如下:
PyTorch for Jetson - version 1.10 now available - Jetson Nano - NVIDIA Developer Forums
下載好后,這里推薦一個(gè)工具(MobaXterm)用于連接板子進(jìn)行互傳,當(dāng)然你也可以直接下載到U盤,用U盤實(shí)現(xiàn)互傳。
具體操作可以看這個(gè)博主的博客:MobaXterm(終端工具)下載&安裝&使用教程_蝸牛也不慢......的博客-CSDN博客
連接成功后把可以在本地下載的東西解壓后,傳過去,建議傳到home目錄下。
?
7. 安裝torch
在torchvision文件夾的目錄下右鍵,進(jìn)入終端,然后進(jìn)入你創(chuàng)建的虛擬環(huán)境,輸入以下命令:
sudo apt-get install python3-pip libopenblas-base libopenmpi-dev
pip install torch-1.8.0-cp36-cp36m-linux_aarch64.whl
?如果遇到網(wǎng)絡(luò)問題下載不了,就在第二個(gè)命令下載后面加上清華下載鏡像:
-i https://pypi.tuna.tsinghua.edu.cn/simple
安裝torch時(shí),一定要安裝numpy,不然的話顯示你安裝torch成功,但是你conda list是找不到的。
sudo apt install python3-numpy
?測試torch是否安裝成功:
import torch
print(torch.__version__)
*報(bào)錯*
在測試torch是否安裝成功時(shí)報(bào)錯:非法指令(核心已轉(zhuǎn)儲)。
export OPENBLAS_CORETYPE=ARMV8
8. 安裝torchvision,逐個(gè)執(zhí)行以下命令(如果執(zhí)行第三個(gè)命令報(bào)錯,繼續(xù)執(zhí)行第四第五行命令,如果不報(bào)錯就直接cd ..)
cd torchvision
export BUILD_VERSION=0.9.0
sudo python setup.py install
python setup.py build
python setup.py install
cd .. #(中間有個(gè)空格)
*報(bào)錯*
在測試時(shí)如果報(bào)錯PIL,就安裝pillow,命令如下(在第二步如果報(bào)權(quán)限錯誤,就在開頭加sudo,或在結(jié)尾加--user)
sudo apt-get install libjpeg8 libjpeg62-dev libfreetype6 libfreetype6-dev
python3 -m pip install -i https://mirrors.aliyun.com/pypi/simple pillow
??
?到這里為止環(huán)境就全部搭建完畢了,下面開始部署Yolo5。
?四、試跑Yolov5
1. 去官網(wǎng)上下載需要的版本,我這里下載的是5.0版本,下載的時(shí)候要把對應(yīng)的權(quán)重也要下載。網(wǎng)址如下:
ultralytics/yolov5 at v5.0 (github.com)
在這里選擇版本:
權(quán)重網(wǎng)址如下:
Releases · ultralytics/yolov5 (github.com)
2. 下載完或者使用工具將yolo5和權(quán)重文件拖到主目錄之后,使用cd進(jìn)入yolo5文件目錄下,在終端下載依賴項(xiàng):
pip install -r requirements.txt -i https://pypi.tuna.tsinghua.edu.cn/simple
?*報(bào)錯*
遇到下面的問題是因?yàn)閚umpy版本過高導(dǎo)致的,降低以下版本就可以了。
把版本降低到1.19.4:
pip install numpy==1.19.4 -i https://pypi.tuna.tsinghua.edu.cn/simple
?解決完報(bào)錯之后,在運(yùn)行上面的命令,會自動下載需要的安裝包,其他的都是很快的,但是到opencv的時(shí)候需要花費(fèi)很長很長很長很長的時(shí)間......,當(dāng)安裝opencv時(shí)會出現(xiàn)Building wheel for opencv-python (pyroject.toml)...? 這種情況正?,F(xiàn)象,是opencv在編譯,繼續(xù)等就可以了,我編譯了2個(gè)多小時(shí)...
3. 經(jīng)過N長時(shí)間的等待,安裝好所有依賴包后,將權(quán)重文件拖到y(tǒng)olov5文件夾根目錄下,在yolov5的根目錄下打開終端,執(zhí)行以下命令:
python3 detect.py --weights yolov5s.pt
測試一下是沒問題的,接下來我們開始部署yolov5。
五、tensorRT部署yolov5
1. tensorRT官網(wǎng)下載yolov5,網(wǎng)址如下,確定下載是v5.0版本:
mirrors / wang-xinyu / tensorrtx · GitCode
mirrors / wang-xinyu / tensorrtx · GitCodeImplementation of popular deep learning networks with TensorRT network definition API ?? Github 鏡像倉庫 ?? 源項(xiàng)目地址https://gitcode.net/mirrors/wang-xinyu/tensorrtx?utm_source=csdn_github_accelerator
2. 下載完畢之后在文件夾里找到gen_wts.py,然后復(fù)制到y(tǒng)olov5文件夾下,右鍵打開終端,執(zhí)行命令后生成yolov5.wts文件:
python3 gen_wts.py --w yolov5s.pt
注:每個(gè)人的文件路徑不同,這個(gè)根據(jù)自己的途權(quán)重路徑而改變,一般路徑都在tensorrtx-yolov5-v5.0 -> yolov5 -> gen_wts.pt。
?3. 找到y(tǒng)ololayer.h文件,打開修改類別數(shù)量(根據(jù)自己的情況而定),和輸入圖片大小(修改是盡量是32的倍數(shù))
4. 在當(dāng)前目錄下創(chuàng)建文件build,命令如下:
maker build #創(chuàng)建build文件夾
cd build #進(jìn)入build
cmake .. #構(gòu)建項(xiàng)目
#將我們上面生成的.wts文件復(fù)制到build文件夾中
make
5. 將上面生成的yolov5.wts文件拖到tensortx/yolov5下,右鍵打開終端:
sudo ./yolov5 -s yolov5s.wts yolov5s.engine s
6. 在tensorrtx-yolov5-v5.0\yolov5下新建sample文件夾,在里面放一張需要測試的圖片(盡量放人的圖片)進(jìn)行測試,命令如下?:
sudo ./yolov5 -d yolov5s.engine ../sample
注:運(yùn)行之后 ,在build文件夾下會生成一張檢測過的圖片,但是效果不是很好,這個(gè)是正常現(xiàn)象。
7.?測試圖片看不出效果,并且真正部署到生產(chǎn)環(huán)境,交付給用戶使用.是通過調(diào)用攝像頭.所以要改一下tensorrtx-yolov5-v5.0 -> yolov5.cpp,可以參考網(wǎng)上大神的教程:
#include <iostream>
#include <chrono>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"
#define USE_FP16 // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0 // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1
// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;
//修改為自己的類別
char *my_classes[]={ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
"fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
"elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
"skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard","surfboard",
"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
"sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
"potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
"microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
"hair drier", "toothbrush" };
static int get_width(int x, float gw, int divisor = 8) {
//return math.ceil(x / divisor) * divisor
if (int(x * gw) % divisor == 0) {
return int(x * gw);
}
return (int(x * gw / divisor) + 1) * divisor;
}
static int get_depth(int x, float gd) {
if (x == 1) {
return 1;
}
else {
return round(x * gd) > 1 ? round(x * gd) : 1;
}
}
//#創(chuàng)建engine和network
ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
INetworkDefinition* network = builder->createNetworkV2(0U);
// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
assert(data);
std::map<std::string, Weights> weightMap = loadWeights(wts_name);
/* ------ yolov5 backbone------ */
auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8");
/* ------ yolov5 head ------ */
auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9");
auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10");
auto upsample11 = network->addResize(*conv10->getOutput(0));
assert(upsample11);
upsample11->setResizeMode(ResizeMode::kNEAREST);
upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());
ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) };
auto cat12 = network->addConcatenation(inputTensors12, 2);
auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13");
auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14");
auto upsample15 = network->addResize(*conv14->getOutput(0));
assert(upsample15);
upsample15->setResizeMode(ResizeMode::kNEAREST);
upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());
ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) };
auto cat16 = network->addConcatenation(inputTensors16, 2);
auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17");
// yolo layer 0
IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18");
ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) };
auto cat19 = network->addConcatenation(inputTensors19, 2);
auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20");
//yolo layer 1
IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);
auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21");
ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) };
auto cat22 = network->addConcatenation(inputTensors22, 2);
auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);
auto yolo = addYoLoLayer(network, weightMap, "model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});
yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*yolo->getOutput(0));
// Build engine
builder->setMaxBatchSize(maxBatchSize);
config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
#if defined(USE_FP16)
config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
assert(builder->platformHasFastInt8());
config->setFlag(BuilderFlag::kINT8);
Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
config->setInt8Calibrator(calibrator);
#endif
std::cout << "Building engine, please wait for a while..." << std::endl;
ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
std::cout << "Build engine successfully!" << std::endl;
// Don't need the network any more
network->destroy();
// Release host memory
for (auto& mem : weightMap)
{
free((void*)(mem.second.values));
}
return engine;
}
ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
INetworkDefinition* network = builder->createNetworkV2(0U);
// Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
assert(data);
std::map<std::string, Weights> weightMap = loadWeights(wts_name);
/* ------ yolov5 backbone------ */
auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, "model.7");
auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, "model.8");
auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.9");
auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, "model.10");
auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.11");
/* ------ yolov5 head ------ */
auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, "model.12");
auto upsample13 = network->addResize(*conv12->getOutput(0));
assert(upsample13);
upsample13->setResizeMode(ResizeMode::kNEAREST);
upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());
ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) };
auto cat14 = network->addConcatenation(inputTensors14, 2);
auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.15");
auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, "model.16");
auto upsample17 = network->addResize(*conv16->getOutput(0));
assert(upsample17);
upsample17->setResizeMode(ResizeMode::kNEAREST);
upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());
ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) };
auto cat18 = network->addConcatenation(inputTensors18, 2);
auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.19");
auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, "model.20");
auto upsample21 = network->addResize(*conv20->getOutput(0));
assert(upsample21);
upsample21->setResizeMode(ResizeMode::kNEAREST);
upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());
ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) };
auto cat22 = network->addConcatenation(inputTensors21, 2);
auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, "model.24");
ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) };
auto cat25 = network->addConcatenation(inputTensors25, 2);
auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.26");
auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, "model.27");
ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) };
auto cat28 = network->addConcatenation(inputTensors28, 2);
auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.29");
auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, "model.30");
ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) };
auto cat31 = network->addConcatenation(inputTensors31, 2);
auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.32");
/* ------ detect ------ */
IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);
IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);
IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);
IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);
auto yolo = addYoLoLayer(network, weightMap, "model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});
yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
network->markOutput(*yolo->getOutput(0));
// Build engine
builder->setMaxBatchSize(maxBatchSize);
config->setMaxWorkspaceSize(16 * (1 << 20)); // 16MB
#if defined(USE_FP16)
config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
assert(builder->platformHasFastInt8());
config->setFlag(BuilderFlag::kINT8);
Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
config->setInt8Calibrator(calibrator);
#endif
std::cout << "Building engine, please wait for a while..." << std::endl;
ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
std::cout << "Build engine successfully!" << std::endl;
// Don't need the network any more
network->destroy();
// Release host memory
for (auto& mem : weightMap)
{
free((void*)(mem.second.values));
}
return engine;
}
void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) {
// Create builder
IBuilder* builder = createInferBuilder(gLogger);
IBuilderConfig* config = builder->createBuilderConfig();
// Create model to populate the network, then set the outputs and create an engine
ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
assert(engine != nullptr);
// Serialize the engine
(*modelStream) = engine->serialize();
// Close everything down
engine->destroy();
builder->destroy();
config->destroy();
}
void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, float* input, float* output, int batchSize) {
// DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
context.enqueue(batchSize, buffers, stream, nullptr);
CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
cudaStreamSynchronize(stream);
}
bool parse_args(int argc, char** argv, std::string& engine) {
if (argc < 3) return false;
if (std::string(argv[1]) == "-v" && argc == 3) {
engine = std::string(argv[2]);
}
else {
return false;
}
return true;
}
int main(int argc, char** argv) {
cudaSetDevice(DEVICE);
//std::string wts_name = "";
std::string engine_name = "";
//float gd = 0.0f, gw = 0.0f;
//std::string img_dir;
if (!parse_args(argc, argv, engine_name)) {
std::cerr << "arguments not right!" << std::endl;
std::cerr << "./yolov5 -v [.engine] // run inference with camera" << std::endl;
return -1;
}
std::ifstream file(engine_name, std::ios::binary);
if (!file.good()) {
std::cerr << " read " << engine_name << " error! " << std::endl;
return -1;
}
char* trtModelStream{ nullptr };
size_t size = 0;
file.seekg(0, file.end);
size = file.tellg();
file.seekg(0, file.beg);
trtModelStream = new char[size];
assert(trtModelStream);
file.read(trtModelStream, size);
file.close();
// prepare input data ---------------------------
static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
//for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
// data[i] = 1.0;
static float prob[BATCH_SIZE * OUTPUT_SIZE];
IRuntime* runtime = createInferRuntime(gLogger);
assert(runtime != nullptr);
ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
assert(engine != nullptr);
IExecutionContext* context = engine->createExecutionContext();
assert(context != nullptr);
delete[] trtModelStream;
assert(engine->getNbBindings() == 2);
void* buffers[2];
// In order to bind the buffers, we need to know the names of the input and output tensors.
// Note that indices are guaranteed to be less than IEngine::getNbBindings()
const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
assert(inputIndex == 0);
assert(outputIndex == 1);
// Create GPU buffers on device
CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
// Create stream
cudaStream_t stream;
CUDA_CHECK(cudaStreamCreate(&stream));
//#讀取本地視頻
//cv::VideoCapture capture("/home/nano/Videos/video.mp4");
//#調(diào)用本地usb攝像頭,我的默認(rèn)參數(shù)為1,如果1報(bào)錯,可修改為0.
cv::VideoCapture capture(1);
if (!capture.isOpened()) {
std::cout << "Error opening video stream or file" << std::endl;
return -1;
}
int key;
int fcount = 0;
while (1)
{
cv::Mat frame;
capture >> frame;
if (frame.empty())
{
std::cout << "Fail to read image from camera!" << std::endl;
break;
}
fcount++;
//if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;
for (int b = 0; b < fcount; b++) {
//cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
cv::Mat img = frame;
if (img.empty()) continue;
cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
int i = 0;
for (int row = 0; row < INPUT_H; ++row) {
uchar* uc_pixel = pr_img.data + row * pr_img.step;
for (int col = 0; col < INPUT_W; ++col) {
data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;
data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;
data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;
uc_pixel += 3;
++i;
}
}
}
// Run inference
auto start = std::chrono::system_clock::now();//#獲取模型推理開始時(shí)間
doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
auto end = std::chrono::system_clock::now();//#結(jié)束時(shí)間
//std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
int fps = 1000.0 / std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
for (int b = 0; b < fcount; b++) {
auto& res = batch_res[b];
nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
}
for (int b = 0; b < fcount; b++) {
auto& res = batch_res[b];
//std::cout << res.size() << std::endl;
//cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
for (size_t j = 0; j < res.size(); j++) {
cv::Rect r = get_rect(frame, res[j].bbox);
cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
std::string label = my_classes[(int)res[j].class_id];
cv::putText(frame, label, cv::Point(r.x, r.y - 1), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
std::string jetson_fps = "FPS: " + std::to_string(fps);
cv::putText(frame, jetson_fps, cv::Point(11, 80), cv::FONT_HERSHEY_PLAIN, 3, cv::Scalar(0, 0, 255), 2, cv::LINE_AA);
}
//cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
}
cv::imshow("yolov5", frame);
key = cv::waitKey(1);
if (key == 'q') {
break;
}
fcount = 0;
}
capture.release();
// Release stream and buffers
cudaStreamDestroy(stream);
CUDA_CHECK(cudaFree(buffers[inputIndex]));
CUDA_CHECK(cudaFree(buffers[outputIndex]));
// Destroy the engine
context->destroy();
engine->destroy();
runtime->destroy();
return 0;
}
修改完.cpp代碼之后,tensorrtx-yolov5-v5.0 -> yolov5 - > build文件夾下,運(yùn)行以下命令:
make
sudo ./yolov5 -v yolov5s.engine
運(yùn)行之后攝像頭調(diào)用成功,效果如下:
文章來源:http://www.zghlxwxcb.cn/news/detail-546754.html
到此我們就大功告成了,以上就是Jeston nano完整的部署過程,希望對你們有所幫助,請大家多多支持。創(chuàng)作不易,希望得到你們的鼓勵~文章來源地址http://www.zghlxwxcb.cn/news/detail-546754.html
到了這里,關(guān)于Jetson nano部署Yolov5目標(biāo)檢測 + Tensor RT加速(超級詳細(xì)版)的文章就介紹完了。如果您還想了解更多內(nèi)容,請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!