国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

在ubuntu20.04上利用tensorrt部署yolov5（C++和Python接口）

2年前作者：uzumakinarutoka分類：Toy博客閱讀(47)違法舉報

這篇具有很好參考價值的文章主要介紹了在ubuntu20.04上利用tensorrt部署yolov5（C++和Python接口）。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

在ubuntu20.04上利用tensorrt部署yolov5（C++和Python接口）‘下個博客是yolov7的部署’
一、CUDA、CUDNN、TensorRT以及OpenCV安裝

CUDA安裝
CUDNN安裝
TensorRT安裝
OpenCV安裝
二、YOLOv5部署

文件準備
模型文件轉(zhuǎn)換
3.生成wts文件
4.生成部署引擎
5.端側(cè)部署模型測試圖片
6.視頻檢測
7.安卓部署
8.C++部署

一、CUDA、CUDNN、TensorRT以及OpenCV安裝
1、CUDA安裝

# CUDA=10.2
# 選擇生成軟鏈接，不需要安裝驅(qū)動
sudo sh cuda_10.2.89_440.33.01_linux.run

# 查看CUDA版本
cat /usr/local/cuda/version.txt

# 測試CUDA，安裝成功則顯示PASS
cd  /usr/local/cuda-9.0/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery

2、CUDNN安裝

# CUDNN=8.1.1.33
# 將壓縮包解壓
sudo cp cuda/include/cudnn.h /usr/local/cuda/include
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
# 查看CUDNN版本
cat /usr/local/cuda/include/cudnn.h | grep CUDNN_MAJOR -A 2

3、TensorRT安裝

# TensorRT=7.2.3.4(Ubuntu18.08_cuda10.2_cudnn_8.1)
# 解壓壓縮包，配置環(huán)境變量
tar xzvf TensorRT
mv TensorRT-7.2.3.4 /usr/local/
sudo gedit ~/.bashrc
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/TensorRT-7.2.3.4/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/TensorRT-7.2.3.4/include

# 安裝TensorRT
cd /usr/local/TensorRT-7.2.3.4/
cd python
# 選擇對應(yīng)的Python版本
pip install tensorrt-7.2.3.4-cp38-none-linux_x86_64.whl

# 安裝graphsurgeon
cd ../graphsurgeon
pip install graphsurgeon-0.4.5-py2.py3-none-any.whl
 
# 安裝uff(TensorRT與TensorFlow配合試用時)
cd ../uff
pip install uff-0.6.9-py2.py3-none-any.whl

# 將tensorrt的庫和頭文件復(fù)制到系統(tǒng)路徑
cd ..
sudo cp -r ./lib/* /usr/lib
sudo cp -r ./include/* /usr/include

# 安裝測試
python  # 進入Python環(huán)境
import tensorrt

4、OpenCV安裝
鏈接: https://opencv.org/releases/page/4/

# 安裝必要的軟件包
sudo apt install  build-essential
sudo apt install cmake git libgtk2.0-dev pkg-config libavcodec-dev libavformat-dev libswscale-dev  
sudo apt install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-dev

# 若第三步報錯：無法定位軟件包libjasper-dev ，安裝libjasper-dev的依賴包libjasper1 
sudo add-apt-repository "deb http://security.ubuntu.com/ubuntu xenial-security main"
sudo apt update
sudo apt upgrade
sudo apt install libjasper1 libjasper-dev

# opencv-3.4.2
# 下載對應(yīng)版本的安裝包：87.3 MB 
# 解壓安裝包
unzip opencv-3.4.2.zip

# 新建build文件夾
mv opencv-3.4.2 /usr/local/
cd opencv-3.4.2
mkdir build
cd build

# 編譯
cmake -D CMAKE_BUILD_TYPE=Release -D CMAKE_INSTALL_PREFIX=/usr/local ..


```python
接下來編譯就好了
make
sudo make install

完成后配置OpenCV環(huán)境：

```python
# 配置環(huán)境
# 打開配置文件
sudo gedit /etc/ld.so.conf
# 添加庫文件路徑，更新系統(tǒng)共享鏈接庫
include /usr/local/opencv-3.4.2/build/lib

sudo gedit /etc/ld.so.conf.d/opencv.conf
# 添加文本
/usr/local/lib 

# 配置文件生效
 sudo ldconfig

配置bash：

# 配置 bash
sudo gedit /etc/bash.bashrc
# 添加文本
PKG_CONFIG_PATH=$PKG_CONFIG_PATH:/usr/local/lib/pkgconfig  
export PKG_CONFIG_PATH  
# 使配置生效
source /etc/bash.bashrc  
# 更新
sudo updatedb

驗證OpenCV配置完成：

# OpenCV示例測試
cd /usr/local/opencv-3.4.2/samples/cpp/example_cmake
cmake .
make
./opencv_example
# 攝像頭打開，出現(xiàn)Hello OpenCV則證明安裝成功

二、YOLOv5部署：

所有的操作在Ubuntu系統(tǒng)中進行
深度學(xué)習(xí)模型部署途徑：
方法1:pt --> onnx/torchscript/wts -–> openvino/tensorrt --> 樹莓派/jetson(linux)  
方法2:pt --> onnx/torchscipt --> ncnn --> 安卓端(android studio)  
方法3:pt --> onnx/torchscript --> ML --> ios端(xcode)

1、文件準備：
U版yolov5，下載6.2/最新版本的都支持: https://github.com/ultralytics/yolov5/releases
tensorrtx，下載6.2/最新版本（感謝wangxinyu大神無私分享）: https://github.com/wang-xinyu/tensorrtx/tree/yolov5-v6.0

#環(huán)境必備
pip install pycuda

2、生成wts文件
以預(yù)訓(xùn)練模型yolov5s.pt為例，將其轉(zhuǎn)換為torchscript, onnx, openvino, engine, coreml, saved_model, pb, tflite, edgetpu, tfjs, paddle
現(xiàn)在的yolov5和yolov7已經(jīng)強大到什么地步了，幾乎所有的格式轉(zhuǎn)換都支持，而且現(xiàn)在都支持圖像分割和姿態(tài)估計。

#進入tensorrt里面，找到y(tǒng)olov5點進去，復(fù)制gen_wts.py文件，粘貼到你的U版yolov5項目里，運行這個命令（yolov5.pt自己下載哈）：
python gen_wts.py ../../weights/yolov5s.pt ./
#在當前目錄生成.wts文件，把生成的wts文件拷貝到tensorrt/yolov5下面

3、生成部署引擎
進入到tensorrtx/yolov5文件中，
想改配置的可以直接在這兩個文件里改，作者的注釋寫的很清楚了：
tensorrtx/yolov5/yololayer.h
tensorrtx/yolov5/yolov5.cpp

mkdir build
cd build
cmake ..
make -j16#電腦不求行的直接make不要-16
#sudo ./yolov5 -s [.wts] [.engine] [s/m/l/x/s6/m6/l6/x6 or c/c6 gd gw]
sudo ./yolov5 -s ../yolov5s.wts yolov5s.engine s
#這樣就生成你的tensorrt .engine文件了

在ubuntu20.04上利用tensorrt部署yolov5（C++和Python接口）

4、端側(cè)部署模型測試圖片

sudo ./yolov5 -d yolov5s.engine ../samples
# sudo ./yolov5 -d [.engine文件] [待檢測圖片的文件夾]
# 在build文件夾下輸出檢測圖片

在ubuntu20.04上利用tensorrt部署yolov5（C++和Python接口）
Python測試：

# 參數(shù)設(shè)置：閾值設(shè)置
CONF_THRESH = 0.5
IOU_THRESHOLD = 0.4
# 必要文件：
PLUGIN_LIBRARY = "build/libmyplugins.so"  #自定義插件文件
engine_file_path = "build/yolov5s.engine"  #模型引擎文件

cd ..
python3 yolov5 _trt.py 
# 在output文件夾下輸出檢測圖片

5、視頻檢測：

#include <iostream>
#include <chrono>
#include <cmath>
#include "cuda_utils.h"
#include "logging.h"
#include "common.hpp"
#include "utils.h"
#include "calibrator.h"

#define USE_FP16  // set USE_INT8 or USE_FP16 or USE_FP32
#define DEVICE 0  // GPU id
#define NMS_THRESH 0.4
#define CONF_THRESH 0.5
#define BATCH_SIZE 1

// stuff we know about the network and the input/output blobs
static const int INPUT_H = Yolo::INPUT_H;
static const int INPUT_W = Yolo::INPUT_W;
static const int CLASS_NUM = Yolo::CLASS_NUM;
static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1;  // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;

char *my_classes[]={ "person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat", "traffic light",
         "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse", "sheep", "cow",
         "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee",
         "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove", "skateboard","surfboard",
         "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
         "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair", "couch",
         "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote", "keyboard", "cell phone",
         "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase", "scissors", "teddy bear",
         "hair drier", "toothbrush" };

static int get_width(int x, float gw, int divisor = 8) {
    //return math.ceil(x / divisor) * divisor
    if (int(x * gw) % divisor == 0) {
        return int(x * gw);
    }
    return (int(x * gw / divisor) + 1) * divisor;
}

static int get_depth(int x, float gd) {
    if (x == 1) {
        return 1;
    } else {
        return round(x * gd) > 1 ? round(x * gd) : 1;
    }
}

ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
    INetworkDefinition* network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights(wts_name);

    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.7");
    auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, "model.8");

    /* ------ yolov5 head ------ */
    auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.9");
    auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, "model.10");

    auto upsample11 = network->addResize(*conv10->getOutput(0));
    assert(upsample11);
    upsample11->setResizeMode(ResizeMode::kNEAREST);
    upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions());

    ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) };
    auto cat12 = network->addConcatenation(inputTensors12, 2);
    auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.13");
    auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, "model.14");

    auto upsample15 = network->addResize(*conv14->getOutput(0));
    assert(upsample15);
    upsample15->setResizeMode(ResizeMode::kNEAREST);
    upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions());

    ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) };
    auto cat16 = network->addConcatenation(inputTensors16, 2);

    auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.17");

    // yolo layer 0
    IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.0.weight"], weightMap["model.24.m.0.bias"]);
    auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, "model.18");
    ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) };
    auto cat19 = network->addConcatenation(inputTensors19, 2);
    auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.20");
    //yolo layer 1
    IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.1.weight"], weightMap["model.24.m.1.bias"]);
    auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, "model.21");
    ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) };
    auto cat22 = network->addConcatenation(inputTensors22, 2);
    auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.23");
    IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.24.m.2.weight"], weightMap["model.24.m.2.bias"]);

    auto yolo = addYoLoLayer(network, weightMap, "model.24", std::vector<IConvolutionLayer*>{det0, det1, det2});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif

    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*)(mem.second.values));
    }

    return engine;
}

ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) {
    INetworkDefinition* network = builder->createNetworkV2(0U);

    // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME
    ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W });
    assert(data);

    std::map<std::string, Weights> weightMap = loadWeights(wts_name);

    /* ------ yolov5 backbone------ */
    auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, "model.0");
    auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, "model.1");
    auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, "model.2");
    auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, "model.3");
    auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, "model.4");
    auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, "model.5");
    auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, "model.6");
    auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, "model.7");
    auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, "model.8");
    auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, "model.9");
    auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, "model.10");
    auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.11");

    /* ------ yolov5 head ------ */
    auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, "model.12");
    auto upsample13 = network->addResize(*conv12->getOutput(0));
    assert(upsample13);
    upsample13->setResizeMode(ResizeMode::kNEAREST);
    upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions());
    ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) };
    auto cat14 = network->addConcatenation(inputTensors14, 2);
    auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.15");

    auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, "model.16");
    auto upsample17 = network->addResize(*conv16->getOutput(0));
    assert(upsample17);
    upsample17->setResizeMode(ResizeMode::kNEAREST);
    upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions());
    ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) };
    auto cat18 = network->addConcatenation(inputTensors18, 2);
    auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.19");

    auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, "model.20");
    auto upsample21 = network->addResize(*conv20->getOutput(0));
    assert(upsample21);
    upsample21->setResizeMode(ResizeMode::kNEAREST);
    upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions());
    ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) };
    auto cat22 = network->addConcatenation(inputTensors21, 2);
    auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, "model.23");

    auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, "model.24");
    ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) };
    auto cat25 = network->addConcatenation(inputTensors25, 2);
    auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, "model.26");

    auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, "model.27");
    ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) };
    auto cat28 = network->addConcatenation(inputTensors28, 2);
    auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, "model.29");

    auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, "model.30");
    ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) };
    auto cat31 = network->addConcatenation(inputTensors31, 2);
    auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, "model.32");

    /* ------ detect ------ */
    IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.0.weight"], weightMap["model.33.m.0.bias"]);
    IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.1.weight"], weightMap["model.33.m.1.bias"]);
    IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.2.weight"], weightMap["model.33.m.2.bias"]);
    IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap["model.33.m.3.weight"], weightMap["model.33.m.3.bias"]);

    auto yolo = addYoLoLayer(network, weightMap, "model.33", std::vector<IConvolutionLayer*>{det0, det1, det2, det3});
    yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME);
    network->markOutput(*yolo->getOutput(0));

    // Build engine
    builder->setMaxBatchSize(maxBatchSize);
    config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB
#if defined(USE_FP16)
    config->setFlag(BuilderFlag::kFP16);
#elif defined(USE_INT8)
    std::cout << "Your platform support int8: " << (builder->platformHasFastInt8() ? "true" : "false") << std::endl;
    assert(builder->platformHasFastInt8());
    config->setFlag(BuilderFlag::kINT8);
    Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, "./coco_calib/", "int8calib.table", INPUT_BLOB_NAME);
    config->setInt8Calibrator(calibrator);
#endif

    std::cout << "Building engine, please wait for a while..." << std::endl;
    ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
    std::cout << "Build engine successfully!" << std::endl;

    // Don't need the network any more
    network->destroy();

    // Release host memory
    for (auto& mem : weightMap)
    {
        free((void*)(mem.second.values));
    }

    return engine;
}

void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) {
    // Create builder
    IBuilder* builder = createInferBuilder(gLogger);
    IBuilderConfig* config = builder->createBuilderConfig();

    // Create model to populate the network, then set the outputs and create an engine
    ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name);
    assert(engine != nullptr);

    // Serialize the engine
    (*modelStream) = engine->serialize();

    // Close everything down
    engine->destroy();
    builder->destroy();
    config->destroy();
}

void doInference(IExecutionContext& context, cudaStream_t& stream, void **buffers, float* input, float* output, int batchSize) {
    // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host
    CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream));
    context.enqueue(batchSize, buffers, stream, nullptr);
    CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream));
    cudaStreamSynchronize(stream);
}

bool parse_args(int argc, char** argv, std::string& engine) {
    if (argc < 3) return false;
    if (std::string(argv[1]) == "-v" && argc == 3) {
        engine = std::string(argv[2]);
    } else {
        return false;
    }
    return true;
}

int main(int argc, char** argv) {
    cudaSetDevice(DEVICE);

    //std::string wts_name = "";
    std::string engine_name = "";
    //float gd = 0.0f, gw = 0.0f;
    //std::string img_dir;

	if(!parse_args(argc,argv,engine_name)){
		std::cerr << "arguments not right!" << std::endl;
        	std::cerr << "./yolov5 -v [.engine] // run inference with camera" << std::endl;
		return -1;
	}

	std::ifstream file(engine_name, std::ios::binary);
        if (!file.good()) {
		std::cerr<<" read "<<engine_name<<" error! "<<std::endl;
		return -1;
	}
	char *trtModelStream{ nullptr };
	size_t size = 0;
        file.seekg(0, file.end);
        size = file.tellg();
        file.seekg(0, file.beg);
        trtModelStream = new char[size];
        assert(trtModelStream);
        file.read(trtModelStream, size);
        file.close();


    // prepare input data ---------------------------
    static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W];
    //for (int i = 0; i < 3 * INPUT_H * INPUT_W; i++)
    //    data[i] = 1.0;
    static float prob[BATCH_SIZE * OUTPUT_SIZE];
    IRuntime* runtime = createInferRuntime(gLogger);
    assert(runtime != nullptr);
    ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
    assert(engine != nullptr);
    IExecutionContext* context = engine->createExecutionContext();
    assert(context != nullptr);
    delete[] trtModelStream;
    assert(engine->getNbBindings() == 2);
    void* buffers[2];
    // In order to bind the buffers, we need to know the names of the input and output tensors.
    // Note that indices are guaranteed to be less than IEngine::getNbBindings()
    const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
    const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
    assert(inputIndex == 0);
    assert(outputIndex == 1);
    // Create GPU buffers on device
    CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float)));
    CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float)));
    // Create stream
    cudaStream_t stream;
    CUDA_CHECK(cudaStreamCreate(&stream));


	cv::VideoCapture capture(0);
    //cv::VideoCapture capture("../overpass.mp4");
    //int fourcc = cv::VideoWriter::fourcc('M','J','P','G');
    //capture.set(cv::CAP_PROP_FOURCC, fourcc);
    if(!capture.isOpened()){
        std::cout << "Error opening video stream or file" << std::endl;
        return -1;
    }

	int key;
    int fcount = 0;
    while(1)
    {
        cv::Mat frame;
        capture >> frame;
        if(frame.empty())
        {
            std::cout << "Fail to read image from camera!" << std::endl;
            break;
        }
        fcount++;
        //if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue;
        for (int b = 0; b < fcount; b++) {
            //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
			cv::Mat img = frame;
            if (img.empty()) continue;
            cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB
            int i = 0;
            for (int row = 0; row < INPUT_H; ++row) {
                uchar* uc_pixel = pr_img.data + row * pr_img.step;
                for (int col = 0; col < INPUT_W; ++col) {
                    data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;
                    data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;
                    data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;
                    uc_pixel += 3;
                    ++i;
                }
            }
        }

        // Run inference
        auto start = std::chrono::system_clock::now();
        doInference(*context, stream, buffers, data, prob, BATCH_SIZE);
        auto end = std::chrono::system_clock::now();
        //std::cout << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
        int fps = 1000.0/std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count();
		std::vector<std::vector<Yolo::Detection>> batch_res(fcount);
        for (int b = 0; b < fcount; b++) {
            auto& res = batch_res[b];
            nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH);
        }
        for (int b = 0; b < fcount; b++) {
            auto& res = batch_res[b];
            //std::cout << res.size() << std::endl;
            //cv::Mat img = cv::imread(img_dir + "/" + file_names[f - fcount + 1 + b]);
            for (size_t j = 0; j < res.size(); j++) {
                cv::Rect r = get_rect(frame, res[j].bbox);
                cv::rectangle(frame, r, cv::Scalar(0x27, 0xC1, 0x36), 2);
		std::string label = my_classes[(int)res[j].class_id];
                cv::putText(frame, label, cv::Point(r.x, r.y - 1), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(0xFF, 0xFF, 0xFF), 2);
                //add FPS in the windows
//				std::string jetson_fps = "Jetson NX FPS: " + std::to_string(fps);
//				cv::putText(frame, jetson_fps, cv::Point(11,80), cv::FONT_HERSHEY_PLAIN, 3, cv::Scalar(0, 0, 255), 2, cv::LINE_AA);
			}
            //cv::imwrite("_" + file_names[f - fcount + 1 + b], img);
        }
		cv::imshow("yolov5",frame);
        key = cv::waitKey(1);
        if (key == 'q'){
            break;
        }
        fcount = 0;
    }

	capture.release();
    // Release stream and buffers
    cudaStreamDestroy(stream);
    CUDA_CHECK(cudaFree(buffers[inputIndex]));
    CUDA_CHECK(cudaFree(buffers[outputIndex]));
    // Destroy the engine
    context->destroy();
    engine->destroy();
    runtime->destroy();

    return 0;
}

用以上代碼替換原來的yolov5.cpp，然后重新編譯：

cd tensorrtx/yolov5/build
cmake ..
make
sudo ./yolov5 -v yolov5s.engine

https://blog.csdn.net/qq_36786467/article/details/121218402文章來源地址http://www.zghlxwxcb.cn/news/detail-431827.html

到了這里，關(guān)于在ubuntu20.04上利用tensorrt部署yolov5（C++和Python接口）的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔相關(guān)法律責任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費用

CUDA11.1、cuDNN8.6.0、Tensorrt8.5.3，ubuntu20.04安裝過程記錄
CUD11.1 下載地址：CUDA Toolkit Archive | NVIDIA Developer ?安裝：對于不是sudo用戶，可以不執(zhí)行sudo，不過沒辦法裝到/usr/local/，可以裝到你有權(quán)限的文件夾目錄。裝完后，需要增加環(huán)境路徑到~/.bashrc并source ?記得source ~/.bashrc使得環(huán)境生效 cuDNN8.6.0，這個版本我是根據(jù)我需要安裝的T
2024年02月15日
瀏覽(58)
C++ TinyWebserver 部署到Linux下，并運行（使用的是Vmware的虛擬機運行Ubuntu20.04）
環(huán)境：Vmware+Ubuntu20.04 1. Tinyweb server項目地址：https://github.com/qinguoyi/TinyWebServer 2. 首先進行mysql5.7的安裝：參考教程：?Ubuntu20.04安裝MySQL5.7-實測3種方法（保姆級教程）：https://blog.csdn.net/liuhuango123/article/details/128264867，使用方法 2 3. 裝好mysql后項目開始，按照TinyWebserver 中 rea
2024年02月02日
瀏覽(24)
Ubuntu20.04下CUDA11.8、cuDNN8.6.0、TensorRT8.5.3.1的配置過程
系統(tǒng)設(shè)置-軟件和更新-附加驅(qū)動-選擇NVIDIA驅(qū)動-應(yīng)用更改。 NVIDIA官網(wǎng)CUDA下載頁面下載CUDA11.8，按步驟進行安裝，注意在安裝CUDA的時候取消驅(qū)動安裝這個選項。安裝完后配置環(huán)境變量，sudo gedit ~/.bashrc，添加： export PATH=$PATH:/usr/local/cuda-11.8/bin export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/us
2024年02月11日
瀏覽(32)
ubuntu 20.04 環(huán)境下安裝CUDA 11.8, cuDNN v8.6.0和TensorRT 8.6.0（deb方式）
ubuntu 20.04 環(huán)境下安裝CUDA 11.8, cuDNN v8.6.0和TensorRT 8.6.0（deb方式）取消勾選驅(qū)動, 下一步添加環(huán)境變量最后一行添加：保存關(guān)閉后，應(yīng)用一下更改下載 cuDNN v8.6.0 for CUDA 11.x 導(dǎo)入CUDA GPG key 刷新存儲庫元數(shù)據(jù) 進入 cudnn-local 目錄安裝參考文檔： 1 NVIDIA CUDA Installation Guide for Linu
2024年02月09日
瀏覽(48)
Ubuntu20.04安裝Nvidia顯卡驅(qū)動、CUDA11.3、CUDNN、TensorRT、Anaconda、ROS/ROS2
打開終端，輸入指令： ?選擇【5】更換系統(tǒng)源，后面還有一個要輸入的選項，選擇【0】退出，就會自動換源。這一步最痛心了家人們，網(wǎng)上的教程太多了，我總是想著離線安裝，每次安裝都無法開機，要不就卡在鎖屏界面，要不就黑屏，要不就卡在snaped界面，重裝系統(tǒng)裝
2024年01月17日
瀏覽(30)
Ubuntu20.04部署ntp服務(wù)
系統(tǒng)版本 ip地址 Ubuntu20.04鏡像服務(wù)端 Ubuntu20.04 10.1.0.55 ubuntu-20.04.5-live-server-amd64 客戶端 Ubuntu20.04 10.1.0.56 ubuntu-20.04.5-live-server-amd64 ntp服務(wù)安裝包： ntp_4.2.8p12+dfsg-3ubuntu4.20.04.1_amd64.deb ntpdate安裝包： ntpdate_4.2.8p12+dfsg-3ubuntu4.20.04.1_amd64.deb Ubuntu下載地址： https://ubuntu.com/download/serv
2024年02月07日
瀏覽(24)
ubuntu 20.04部署brc20 ordinals銘文
1、btc節(jié)點部署文檔詳見btc節(jié)點部署官方文檔 2、更改之前節(jié)點部署rpc訪問方式下載bitcoin包后相對路徑：bitcoin-26.0/share/rpcauth/rpcauth.py 文件執(zhí)行文件創(chuàng)建cookie文件注釋掉bitcoin.conf的賬號密碼 3、編譯ordinals代碼根據(jù)情況，如果沒有rust環(huán)境執(zhí)行安裝 4、同步ordinals數(shù)據(jù)
2024年01月16日
瀏覽(32)
在iPad利用UTM安裝Ubuntu20.04的完整過程+遠程操控
提示：文章寫完后，目錄可以自動生成，如何生成可參考右邊的幫助文檔本文主要分享在iPad上從0開始安裝Ubuntu20.04的步驟和經(jīng)驗。在安裝期間用到的參考文檔如下： iOS設(shè)備安裝虛擬機的主要思路https://www.youtube.com/watch?v=x0sGWL8zQpg UTM中文文檔https://utmapp.wiki/#/ UTM虛擬機安裝使
2024年02月05日
瀏覽(126)
【Ubuntu20.04】使用 systemd 進行服務(wù)部署
ExecStart，改成自己腳本的路徑，比如程序啟動腳本 Restart，異常重啟 RestartSec，異常后多少秒后重啟 StartLimitInterval，異常后重試多少次，0 一直重試將上述文件命名為 your_app.serivce ，放入 /etc/systemd/system 目錄下比如，你的服務(wù)名稱叫 your_app 服務(wù)啟用服務(wù)啟動服務(wù)重新啟動
2024年04月25日
瀏覽(29)
Ubuntu20.04使用cephadm部署ceph集群
Cephadm通過在單個主機上創(chuàng)建一個Ceph單機集群，然后向集群中添加主機以擴展集群，進而部署其他服務(wù)。 VMware安裝Ubuntu20.04并使用Xshell連接虛擬機：https://blog.csdn.net/gengduc/article/details/134889416 Python3 Systemd Podman或Docker 時間同步chrony或NTP LVM2 主機名hostname 硬盤設(shè)備 ip地址 Ceph服務(wù)
2024年02月05日
瀏覽(23)

<rp id="n1ssk"><code id="n1ssk"></code></rp>

<strike id="n1ssk"><button id="n1ssk"><samp id="n1ssk"></samp></button></strike>