一、前言
1)目標(biāo)
-
本指南的首要目的是提供一種快速使用百度開源深度學(xué)習(xí)平臺(飛漿平臺)的方法,飛漿平臺提供了很多已經(jīng)完成訓(xùn)練的AI模型,讓了解 Python、Docker、Linux 基礎(chǔ)知識的開發(fā)人員能夠在一至幾個工作日內(nèi)完成飛漿AI模型在項目的使用搭建;
-
飛漿平臺支持的硬件環(huán)境:
- Nvida顯卡系列:CUDA 10.2、CUDA 11.2、CUDA 11.6、CUDA 11.7
- AMD顯卡系列:ROCm 4.0
- CPU
-
飛漿平臺模型開發(fā)套件(基于開發(fā)套件進(jìn)行訓(xùn)練會產(chǎn)生不同的模型)
- 文字識別 - PaddleOCR,身份證、車票、增值稅發(fā)票、車牌、液晶屏讀數(shù)、印章等提取文字
- 視頻理解 - PaddleVideo,工業(yè)環(huán)境中異常行為檢測,體育運動中的動作剪輯,互聯(lián)網(wǎng)場景中的視頻質(zhì)量評估
- 目標(biāo)檢測 - PaddleDetection,車流統(tǒng)計、車輛違章檢測、闖入、表面質(zhì)量檢測
- 圖像分割 - PaddleSeg,道路積水識別、區(qū)域變化檢測
- 語音識別 - PaddleSpeech,語音翻譯、語音合成、標(biāo)點恢復(fù)
- 語義理解 - ERNIE,自動問答、情感分析、基于語義的相似度推薦
- 圖神經(jīng)網(wǎng)絡(luò) - PGL,推薦系統(tǒng)、知識圖譜、風(fēng)控、流量預(yù)測
- 時空大數(shù)據(jù)計算工具 - PaddleSpatial,在城市空間區(qū)域畫像、智能交管、道路規(guī)劃場景中提供算法支持
-
-
當(dāng)前指南以 PaddlePaddle CPU Docker 鏡像為基礎(chǔ),安裝 PaddleOCR 模型開發(fā)套件,并使用配套的 PP-OCRv3 文字識別模型;
-
使用 Python Flask 輕量化 Web 框架完成了 PaddleOCR SDK 能力轉(zhuǎn)換成 HTTP API 服務(wù)(非常簡單,代碼量很小,不足100行),其他模型的 SDK 能力可以參照本實例進(jìn)行開發(fā);
-
按照生產(chǎn)環(huán)境發(fā)布的要求,使用 uWSGI 運行 Flask,并構(gòu)造為一個 Docker 鏡像,方便進(jìn)行發(fā)布;
-
鏡像構(gòu)造完成后能夠支持離線環(huán)境運行。
2)未解決的問題
- PaddleOCR 以 PaddleOcloud 的身份在 Dockerhub 發(fā)布了 PaddleOCR 鏡像,實際測試下來,已知的問題為:默認(rèn)鏡像沒有提供paddleocr命令,flask 版本比 paddlepaddle/paddle:2.4.1 低,簡單嘗試之后暫時放棄,回到了使用 paddlepaddle/paddle:2.4.1作為上游鏡像構(gòu)造本鏡像;
- paddleocr 作為模塊安裝完成后,實際使用仍然需要下載模型庫,由于未找到正確放置模塊庫的方法,暫時在 Dockerfile 中通過執(zhí)行一次 paddleocr 測試識別來完成模型庫的自動下載;
- paddleocr_http.py 做的比較簡單,flask 的日志配置沒有從程序中獨立出來,paddleocr只做了圖片接口,沒有做pdf文件接口,這些都需要在項目中根據(jù)情況去完善。
二、Docker 安裝 PaddlePaddle
PaddlePaddle是飛漿平臺的基礎(chǔ)運行環(huán)境,使用飛漿平臺提供的已經(jīng)完成訓(xùn)練的模型時,需要依賴此基礎(chǔ)運行環(huán)境。
進(jìn)入飛漿快速安裝網(wǎng)頁,按照自己情況選擇對應(yīng)的飛漿版本、操作系統(tǒng)、計算平臺,安裝方式建議選擇 Docker。以下以我自己Intel 蘋果筆記本電腦Docker方式安裝為例:
創(chuàng)建一個掛載目錄,用于容器和宿主機(jī)交換文件數(shù)據(jù):
mkdir paddle
cd paddle
拉取鏡像并進(jìn)入Docker容器
docker run --name paddle -it -v $PWD:/paddle registry.baidubce.com/paddlepaddle/paddle:2.4.1 /bin/bash
三、在 PaddlePaddle Docker 環(huán)境中安裝 PP-OCRv3 模型并執(zhí)行測試
將測試的身份證照片放入 剛剛創(chuàng)建的paddle 目錄,并在容器內(nèi)進(jìn)入 /paddle 目錄。然后安裝 PP-OCRv3 模型,參照官方的快速體驗方法:
cd /paddle
python3 -m pip install paddleocr
命令輸出:
Collecting paddleocr
Downloading paddleocr-2.6.1.2-py3-none-any.whl (440 kB)
|████████████████████████████████| 440 kB 688 kB/s
Collecting opencv-python
Downloading opencv_python-4.7.0.68-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (61.8 MB)
|████████████████████████████████| 61.8 MB 4.6 MB/s
Collecting scikit-image
Downloading scikit_image-0.19.3-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (13.5 MB)
|████████████████████████████████| 13.5 MB 4.4 MB/s
Collecting attrdict
Downloading attrdict-2.0.1-py2.py3-none-any.whl (9.9 kB)
Collecting lmdb
Downloading lmdb-1.4.0-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (299 kB)
|████████████████████████████████| 299 kB 373 kB/s
……
Successfully installed Babel-2.11.0 Flask-Babel-3.0.1 PyMuPDF-1.20.2 PyWavelets-1.3.0 Werkzeug-2.2.2 aiofiles-22.1.0 aiohttp-3.8.3 aiosignal-1.3.1 altair-4.2.2 anyio-3.6.2 async-timeout-4.0.2 asynctest-0.13.0 attrdict-2.0.1 bce-python-sdk-0.8.74 beautifulsoup4-4.11.1 brotli-1.0.9 cachetools-5.3.0 click-8.1.3 cssselect-1.2.0 cssutils-2.6.0 cycler-0.11.0 cython-0.29.33 dill-0.3.6 et-xmlfile-1.1.0 fastapi-0.89.1 ffmpy-0.3.0 fire-0.5.0 flask-2.2.2 fonttools-4.38.0 frozenlist-1.3.3 fsspec-2023.1.0 future-0.18.3 gevent-22.10.2 geventhttpclient-2.0.2 gradio-3.16.2 greenlet-2.0.2 grpcio-1.42.0 h11-0.14.0 httpcore-0.16.3 httpx-0.23.3 imageio-2.25.0 imgaug-0.4.0 importlib-resources-5.10.2 itsdangerous-2.1.2 jinja2-3.1.2 jsonschema-4.17.3 kiwisolver-1.4.4 linkify-it-py-1.0.3 lmdb-1.4.0 lxml-4.9.2 markdown-it-py-2.1.0 markupsafe-2.1.2 matplotlib-3.5.3 mdit-py-plugins-0.3.3 mdurl-0.1.2 mpmath-1.2.1 multidict-6.0.4 multiprocess-0.70.14 networkx-2.6.3 onnx-1.13.0 opencv-contrib-python-4.7.0.68 opencv-python-4.7.0.68 openpyxl-3.0.10 orjson-3.8.5 paddleocr-2.6.1.2 pandas-1.3.5 pdf2docx-0.5.6 pkgutil-resolve-name-1.3.10 premailer-3.10.0 psutil-5.9.4 pyclipper-1.3.0.post4 pycryptodome-3.17 pydantic-1.10.4 pydub-0.25.1 pyrsistent-0.19.3 python-docx-0.8.11 python-multipart-0.0.5 python-rapidjson-1.9 pytz-2022.7.1 rapidfuzz-2.13.7 rarfile-4.0 rfc3986-1.5.0 scikit-image-0.19.3 scipy-1.7.3 shapely-2.0.0 sniffio-1.3.0 soupsieve-2.3.2.post1 starlette-0.22.0 sympy-1.10.1 termcolor-2.2.0 tifffile-2021.11.2 toolz-0.12.0 tqdm-4.64.1 tritonclient-2.29.0 uc-micro-py-1.0.1 uvicorn-0.20.0 visualdl-2.5.0 websockets-10.4 x2paddle-1.4.0 yarl-1.8.2 zope.event-4.6 zope.interface-5.5.2
執(zhí)行身份證圖片測試,第一次執(zhí)行會下載一些必要的依賴,后面再執(zhí)行就不需要下載了。從測試的情況看,不論身份證圖片大小,單次身份證圖片識別需要1~2秒鐘。
paddleocr --image_dir ./small.jpg --use_angle_cls true --use_gpu false --lang=ch
命令輸出:
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tar to /root/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer/ch_PP-OCRv3_det_infer.tar
100%|█████████████████████████████████████████████████████████████████████████████| 3.83M/3.83M [00:01<00:00, 3.43MiB/s]
download https://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_rec_infer.tar to /root/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer/ch_PP-OCRv3_rec_infer.tar
100%|█████████████████████████████████████████████████████████████████████████████| 11.9M/11.9M [00:07<00:00, 1.52MiB/s]
download https://paddleocr.bj.bcebos.com/dygraph_v2.0/ch/ch_ppocr_mobile_v2.0_cls_infer.tar to /root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer/ch_ppocr_mobile_v2.0_cls_infer.tar
100%|█████████████████████████████████████████████████████████████████████████████| 2.19M/2.19M [00:01<00:00, 1.84MiB/s]
[2023/01/30 04:54:30] ppocr DEBUG: Namespace(alpha=1.0, benchmark=False, beta=1.0, cls_batch_num=6, cls_image_shape='3, 48, 192', cls_model_dir='/root/.paddleocr/whl/cls/ch_ppocr_mobile_v2.0_cls_infer', cls_thresh=0.9, cpu_threads=10, crop_res_save_dir='./output', det=True, det_algorithm='DB', det_box_type='quad', det_db_box_thresh=0.6, det_db_score_mode='fast', det_db_thresh=0.3, det_db_unclip_ratio=1.5, det_east_cover_thresh=0.1, det_east_nms_thresh=0.2, det_east_score_thresh=0.8, det_limit_side_len=960, det_limit_type='max', det_model_dir='/root/.paddleocr/whl/det/ch/ch_PP-OCRv3_det_infer', det_pse_box_thresh=0.85, det_pse_min_area=16, det_pse_scale=1, det_pse_thresh=0, det_sast_nms_thresh=0.2, det_sast_score_thresh=0.5, draw_img_save_dir='./inference_results', drop_score=0.5, e2e_algorithm='PGNet', e2e_char_dict_path='./ppocr/utils/ic15_dict.txt', e2e_limit_side_len=768, e2e_limit_type='max', e2e_model_dir=None, e2e_pgnet_mode='fast', e2e_pgnet_score_thresh=0.5, e2e_pgnet_valid_set='totaltext', enable_mkldnn=False, fourier_degree=5, gpu_mem=500, help='==SUPPRESS==', image_dir='./small.jpg', image_orientation=False, ir_optim=True, kie_algorithm='LayoutXLM', label_list=['0', '180'], lang='ch', layout=True, layout_dict_path=None, layout_model_dir=None, layout_nms_threshold=0.5, layout_score_threshold=0.5, max_batch_size=10, max_text_length=25, merge_no_span_structure=True, min_subgraph_size=15, mode='structure', ocr=True, ocr_order_method=None, ocr_version='PP-OCRv3', output='./output', page_num=0, precision='fp32', process_id=0, re_model_dir=None, rec=True, rec_algorithm='SVTR_LCNet', rec_batch_num=6, rec_char_dict_path='/usr/local/lib/python3.7/dist-packages/paddleocr/ppocr/utils/ppocr_keys_v1.txt', rec_image_inverse=True, rec_image_shape='3, 48, 320', rec_model_dir='/root/.paddleocr/whl/rec/ch/ch_PP-OCRv3_rec_infer', recovery=False, save_crop_res=False, save_log_path='./log_output/', scales=[8, 16, 32], ser_dict_path='../train_data/XFUND/class_list_xfun.txt', ser_model_dir=None, show_log=True, sr_batch_num=1, sr_image_shape='3, 32, 128', sr_model_dir=None, structure_version='PP-StructureV2', table=True, table_algorithm='TableAttn', table_char_dict_path=None, table_max_len=488, table_model_dir=None, total_process_num=1, type='ocr', use_angle_cls=True, use_dilation=False, use_gpu=False, use_mp=False, use_npu=False, use_onnx=False, use_pdf2docx_api=False, use_pdserving=False, use_space_char=True, use_tensorrt=False, use_visual_backbone=True, use_xpu=False, vis_font_path='./doc/fonts/simfang.ttf', warmup=False)
[2023/01/30 04:54:38] ppocr INFO: **********./small.jpg**********
[2023/01/30 04:54:45] ppocr DEBUG: dt_boxes num : 12, elapse : 6.396355390548706
[2023/01/30 04:54:45] ppocr DEBUG: cls num : 12, elapse : 0.3474392890930176
[2023/01/30 04:54:50] ppocr DEBUG: rec_res num : 12, elapse : 4.751603364944458
[2023/01/30 04:54:50] ppocr INFO: [[[259.0, 112.0], [391.0, 112.0], [391.0, 153.0], [259.0, 153.0]], ('***', 0.9297130703926086)]
[2023/01/30 04:54:50] ppocr INFO: [[[142.0, 125.0], [250.0, 121.0], [251.0, 153.0], [143.0, 157.0]], ('姓名', 0.9741479158401489)]
[2023/01/30 04:54:50] ppocr INFO: [[[182.0, 194.0], [488.0, 194.0], [488.0, 229.0], [182.0, 229.0]], ('別男民族漢', 0.991115391254425)]
[2023/01/30 04:54:50] ppocr INFO: [[[162.0, 272.0], [222.0, 272.0], [222.0, 303.0], [162.0, 303.0]], ('生', 0.9976730942726135)]
[2023/01/30 04:54:50] ppocr INFO: [[[248.0, 272.0], [573.0, 269.0], [573.0, 300.0], [248.0, 304.0]], ('19**年*月3日', 0.9517846703529358)]
[2023/01/30 04:54:50] ppocr INFO: [[[140.0, 351.0], [225.0, 351.0], [225.0, 384.0], [140.0, 384.0]], ('住址', 0.9953198432922363)]
[2023/01/30 04:54:50] ppocr INFO: [[[256.0, 354.0], [641.0, 351.0], [642.0, 382.0], [256.0, 384.0]], ('武漢市**區(qū)*****', 0.9569936394691467)]
[2023/01/30 04:54:50] ppocr INFO: [[[256.0, 405.0], [610.0, 401.0], [610.0, 435.0], [256.0, 439.0]], ('******-*-****', 0.9409841299057007)]
[2023/01/30 04:54:50] ppocr INFO: [[[137.0, 557.0], [362.0, 555.0], [362.0, 586.0], [138.0, 589.0]], ('公民身份號碼', 0.9767408967018127)]
[2023/01/30 04:54:50] ppocr INFO: [[[406.0, 553.0], [949.0, 550.0], [949.0, 584.0], [406.0, 586.0]], ('4***************', 0.9046091437339783)]
四、以 HTTP 標(biāo)準(zhǔn)接口提供 OCR 識別能力
在 Docker 掛載的 paddle 目錄下編寫 Python Flask Web 應(yīng)用,paddleocr_http.py 文件內(nèi)容如下:
import time
from http.server import BaseHTTPRequestHandler, HTTPServer
from logging.config import dictConfig
import cv2
import numpy as np
from flask import Flask, request
from paddleocr import PaddleOCR
# 配置flask的日志
dictConfig(
{
"version": 1,
"formatters": {
"default": {
"format": "[%(asctime)s] %(levelname)s in %(module)s: %(message)s",
}
},
"handlers": {
"wsgi": {
"class": "logging.StreamHandler",
"stream": "ext://flask.logging.wsgi_errors_stream",
"formatter": "default",
}
},
"root": {"level": "DEBUG", "handlers": ["wsgi"]},
}
)
app = Flask(__name__)
# 解決中文輸出為unicode
app.config["JSON_AS_ASCII"] = False
# 解決瀏覽器中中文亂碼
app.config["JSONIFY_MIMETYPE"] = "application/json;charset=UTF-8"
# 將flask的日志對象設(shè)置為常用的logger別名
logger = app.logger
def upload_image(bytes, filename=None, mime_type=None):
# Paddleocr目前支持的多語言語種可以通過修改lang參數(shù)進(jìn)行切換
# 例如`ch`, `en`, `fr`, `german`, `korean`, `japan`
ocr = PaddleOCR(
use_angle_cls=True, lang="ch"
) # need to run only once to download and load model into memory
# 轉(zhuǎn)換圖片二進(jìn)制為ocr接口需要的ndarray數(shù)據(jù)結(jié)構(gòu)
np_arr = np.frombuffer(bytes, dtype=np.uint8)
img = cv2.imdecode(np_arr, cv2.IMREAD_COLOR)
# 執(zhí)行圖片識別
result = ocr.ocr(img, cls=True)
logger.debug(result)
ocr_result = []
for idx in range(len(result)):
res = result[idx]
for line in res:
ocr_result.append(line[1])
return ocr_result
@app.route("/hello")
def hello():
return f"Hello!"
"""執(zhí)行圖像上的文字識別,返回識別的文字以及對應(yīng)文字識別的置信值,支持自動旋轉(zhuǎn)圖片"""
@app.route("/v1/img/ocr", methods=["GET", "POST"])
def api_v1_upload_img():
req_start = time.time()
if request.method == "POST":
if "file" not in request.files:
return "No file part"
file = request.files["file"]
if file.filename == "":
return "No selected file"
if file:
file_name = file.filename
# 獲取圖片的二進(jìn)制
res = upload_image(bytes=file.read(), filename=file_name)
req_elapsed = time.time() - req_start
logger.info(
"Image %s OCR Took %.2f seconds. result: %s",
file.filename,
req_elapsed,
res,
)
return res
return """
<!doctype html>
<title>Upload new File</title>
<h1>Upload new File</h1>
<form method=post enctype=multipart/form-data>
<input type=file name=file>
<input type=submit value=Upload>
</form>
"""
##五、在 Docker 中執(zhí)行 Flask 開發(fā)模式調(diào)試
flask --app paddleocr_http --debug run --host 0.0.0.0 --port 8080
命令輸出:
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
* Serving Flask app 'paddleocr_http'
* Debug mode: on
[2023-01-31 01:08:54,162] INFO in _internal: WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
* Running on all addresses (0.0.0.0)
* Running on http://127.0.0.1:8080
* Running on http://172.17.0.2:8080
[2023-01-31 01:08:54,163] INFO in _internal: Press CTRL+C to quit
[2023-01-31 01:08:54,164] INFO in _internal: * Restarting with stat
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
[2023-01-31 01:08:56,326] WARNING in _internal: * Debugger is active!
[2023-01-31 01:08:56,341] INFO in _internal: * Debugger PIN: 108-237-100
再次以命令行進(jìn)入前述 Docker 容器的方法:
- 確保已經(jīng)初始化的 Docker 容器是啟動狀態(tài),若未啟動,執(zhí)行
docker start paddle_docker
啟動 Docker 容器;- 使用
docker ps
確定 paddle_docker 容器的 ID;- 執(zhí)行
docker attach [CONTAINERID]
進(jìn)入容器的命令行;- 執(zhí)行
cd /paddle
目錄啟動 Flask 調(diào)試模式。
curl測試
curl -X POST -F file=@./large.jpg http://localhost:8080/v1/img/ocr
命令輸出:
[
[
"***",
0.9297130703926086
],
[
"姓名",
0.9741479158401489
],
[
"別男民族漢",
0.991115391254425
],
[
"生",
0.9976730942726135
],
[
"19**年*月*日",
0.9517846703529358
],
[
"住址",
0.9953198432922363
],
[
"**市**區(qū)**路**",
0.9569936394691467
],
[
"******-*-****",
0.9409841299057007
],
[
"公民身份號碼",
0.9767408967018127
],
[
"******************",
0.9046091437339783
]
]
數(shù)組內(nèi)的第一個字段是識別的文字內(nèi)容,第二個字段是置信值。
也可以在瀏覽器中訪問 http://localhost:8080/v1/img/ocr 上傳圖片文件進(jìn)行測試。
六、執(zhí)行生產(chǎn)部署
雖然Flask 的內(nèi)建服務(wù)器輕便且易于使用,但是 Flask 的內(nèi)建服務(wù)器不適用于生產(chǎn) ,它也不能很好的擴(kuò)展。由于 Flask 應(yīng)用支持Gunicorn、uWSGI、Gevent、Eventlet、Twisted Web等多種方式自主部署,以下選用 uWSGI 執(zhí)行自主部署。
uWSGI 一個用 C 編寫的快速應(yīng)用服務(wù)器。它配置豐富,也為撰寫強(qiáng)大的網(wǎng)絡(luò)應(yīng)用提供了許多其他工具。 告訴 uWSGI 如何導(dǎo)入你的 Flask 應(yīng)用對象就可以運行 Flask 應(yīng)用。
請務(wù)必把
app.run()
放在if __name__ == '__main__':
內(nèi)部或者放在單獨的文件中,這樣可以保證它不會被調(diào)用。因為,每調(diào)用一次就會開啟一個本地 WSGI 服務(wù)器。當(dāng)我們使用 uWSGI 部署應(yīng)用時,不需要使用本地服務(wù)器。
安裝uwsgi
python3 -m pip install uwsgi
命令輸出:
Collecting uwsgi
Downloading uwsgi-2.0.21.tar.gz (808 kB)
|████████████████████████████████| 808 kB 806 kB/s
Building wheels for collected packages: uwsgi
Building wheel for uwsgi (setup.py) ... done
Created wheel for uwsgi: filename=uWSGI-2.0.21-cp37-cp37m-linux_x86_64.whl size=559887 sha256=c02418d94313937f621fc9fbac7ae073a349a11f436661164bdefba01669774e
Stored in directory: /root/.cache/pip/wheels/b1/b8/6a/cafb5a30fed7e484147b84224e4264ab3930dfaf0586c326fb
Successfully built uwsgi
Installing collected packages: uwsgi
Successfully installed uwsgi-2.0.21
將正式發(fā)布的 paddleocr_http.py 文件從外部掛載目錄 /paddle
移動到 Docker 容器內(nèi)部 /home
目錄
cp -rp /paddle/paddleocr_http.py /home/paddleocr_http.py
uWSGI 提供包括 HTTP/HTTPS router/proxy/load-balancer 多種前置服務(wù)模式。本次我們選擇 HTTP 模式,在使用 uWSGI 的 HTTP 服務(wù)時,uWSGI 也是將請求轉(zhuǎn)發(fā)給 uWSGI 工作者,并提供了兩種方式:嵌入式和獨立式。在嵌入式模式下,它將自動生成 uWSGI 工作者并設(shè)置通信套接字。在獨立模式下,你必須指定要連接的uWSGI套接字的地址。我們選擇嵌入模式。
uwsgi 是基于 python 模塊中的 WSGI 調(diào)用的。我們的 Flask 應(yīng)用名稱為 paddleocr_http.py , 可以使用以下命令:
uwsgi --http 0.0.0.0:8080 --master --wsgi-file /home/paddleocr_http.py --callable app --processes 4 --threads 2
參數(shù) -p 4
表示一次最多可以使用 4 個 worker 來處理 4 個請求。 --http 0.0.0.0:8080
表示在所有接口的 8080 端口上提供服務(wù)。
命令輸出:
*** Starting uWSGI 2.0.21 (64bit) on [Tue Jan 31 02:22:59 2023] ***
compiled with version: 7.5.0 on 31 January 2023 01:58:47
os: Linux-5.15.49-linuxkit #1 SMP Tue Sep 13 07:51:46 UTC 2022
nodename: 874df41b5721
machine: x86_64
clock source: unix
detected number of CPU cores: 6
current working directory: /paddle
detected binary path: /usr/local/bin/uwsgi
!!! no internal routing support, rebuild with pcre support !!!
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
your memory page size is 4096 bytes
detected max file descriptor number: 1048576
lock engine: pthread robust mutexes
thunder lock: disabled (you can enable it with --thunder-lock)
uWSGI http bound on 0.0.0.0:8080 fd 4
uwsgi socket 0 bound to TCP address 127.0.0.1:36529 (port auto-assigned) fd 3
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
Python version: 3.7.13 (default, Apr 24 2022, 01:04:09) [GCC 7.5.0]
Python main interpreter initialized at 0x555a043df120
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
python threads support enabled
your server socket listen backlog is limited to 100 connections
your mercy for graceful operations on workers is 60 seconds
mapped 416880 bytes (407 KB) for 8 cores
*** Operational MODE: preforking+threaded ***
grep: warning: GREP_OPTIONS is deprecated; please use an alias or script
WSGI app 0 (mountpoint='') ready in 2 seconds on interpreter 0x555a043df120 pid: 1145 (default app)
uWSGI running as root, you can use --uid/--gid/--chroot options
*** WARNING: you are running uWSGI as root !!! (use the --uid flag) ***
*** uWSGI is running in multiple interpreter mode ***
spawned uWSGI master process (pid: 1145)
spawned uWSGI worker 1 (pid: 1165, cores: 2)
spawned uWSGI worker 2 (pid: 1168, cores: 2)
spawned uWSGI worker 3 (pid: 1171, cores: 2)
spawned uWSGI worker 4 (pid: 1174, cores: 2)
spawned uWSGI http 1 (pid: 1177)
這里我們忽略了 root 模式運行的警告信息。
由于 Docker 中不能使用 systemd 進(jìn)行服務(wù)自啟動,為了遵循一個 docker 容器進(jìn)程運行一個服務(wù)的規(guī)范,準(zhǔn)備以paddlepaddle/paddle:2.4.1為基礎(chǔ)鏡像,創(chuàng)建paddle-http的鏡像。如果是有CUDA、ROCm的運行環(huán)境,記得修改對應(yīng)的基礎(chǔ)鏡像創(chuàng)建 paddle-http的鏡像。
首先在本機(jī)(非 Docker 容器中) paddleocr_http.py
相同的目錄下編寫 Dockerfile
,內(nèi)容如下:
# 基于paddleocr鏡像,注意選擇需要的版本號以及CPU、GPU類型
FROM paddlepaddle/paddle:2.4.1
# 設(shè)置工作目錄為 /demo
WORKDIR /home
# 將依賴文件拷貝到工作目錄
COPY paddleocr_http.py /home
COPY test.jpg /home
# 執(zhí)行pip指令,安裝這個應(yīng)用所需要的依賴,當(dāng)前只安裝了paddleocr模型,和運行需要的uwsgi服務(wù)
# 如果需要在本鏡像內(nèi)使用更多模型,可在此處添加,并增加 paddleocr_http.py 程序功能
RUN python3 -m pip install paddleocr uwsgi
# 執(zhí)行一次測試,讓paddleocr下載模型庫
RUN paddleocr --image_dir ./test.jpg --use_angle_cls true --use_gpu false --lang=ch
# 允許外界訪問8080端口
EXPOSE 8080
# 設(shè)置容器進(jìn)程為uwsgi嵌入模式啟動
ENTRYPOINT ["uwsgi", "--http", "0.0.0.0:8080", \
"--master", \
"--wsgi-file", "/home/paddleocr_http.py", \
"--callable", "app", \
"--processes", "4", \
"--threads", "2"]
修改 paddleocr_http.py 文件,設(shè)置 Flask 默認(rèn)日志級別為INFO
"root": {"level": "INFO", "handlers": ["wsgi"]},
執(zhí)行 Docker 編譯
docker build . -t paddle-http
命令輸出:
[+] Building 268.6s (11/11) FINISHED
=> [internal] load build definition from Dockerfile 0.0s
=> => transferring dockerfile: 37B 0.0s
=> [internal] load .dockerignore 0.0s
=> => transferring context: 2B 0.0s
=> [internal] load metadata for docker.io/paddlepaddle/paddle:2.4.1 1.2s
=> [internal] load build context 0.0s
=> => transferring context: 180B 0.0s
=> [1/6] FROM docker.io/paddlepaddle/paddle:2.4.1@sha256:72d9cfad34dcfae39743 0.0s
=> CACHED [2/6] WORKDIR /home 0.0s
=> CACHED [3/6] COPY paddleocr_http.py /home 0.0s
=> CACHED [4/6] COPY test.jpg /home 0.0s
=> [5/6] RUN python3 -m pip install paddleocr uwsgi 237.8s
=> [6/6] RUN paddleocr --image_dir ./test.jpg --use_angle_cls true --use_gpu 22.0s
=> exporting to image 7.5s
=> => exporting layers 7.5s
=> => writing image sha256:a4469cddbb63ffa66c25826a96a7a912e13107a1811dd665e3 0.0s
=> => naming to docker.io/library/paddle-http 0.0s
Use 'docker scan' to run Snyk tests against images to find vulnerabilities and learn how to fix them
停止 paddle_docker
docker 容器,防止端口占用
docker stop paddle_docker
第一次啟動 paddle-http
應(yīng)用
docker run --name paddle-http -p 8080:8080 paddle-http
停止后再次啟動文章來源:http://www.zghlxwxcb.cn/news/detail-673826.html
docker start paddle-http
paddle-http啟動成功后,可以通過瀏覽器訪問 http://localhost:8080/v1/img/ocr 執(zhí)行測試。文章來源地址http://www.zghlxwxcb.cn/news/detail-673826.html
七、參照資料
- Flask
- uWSGI Deploying Flask
- PaddlePaddle官網(wǎng)
- PaddleOCR網(wǎng)址
- Dockerfile資料
到了這里,關(guān)于安裝PaddlePaddle及使用PP-OCRv3 模型提取身份證信息的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!