1.安裝tensorflow-gpu
Building wheels for collected packages: tensorflow-gpu
Building wheel for tensorflow-gpu (setup.py): started
Building wheel for tensorflow-gpu (setup.py): finished with status 'error'
Running setup.py clean for tensorflow-gpu
error: subprocess-exited-with-error
× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-i6frcfa8/tensorflow-gpu_2cea358528754cc596c541f9c2ce45ca/setup.py", line 37, in <module>
raise Exception(TF_REMOVAL_WARNING)
Exception:
=========================================================
The "tensorflow-gpu" package has been removed!
Please install "tensorflow" instead.
Other than the name, the two packages have been identical
since TensorFlow 2.1, or roughly since Sep 2019. For more
information, see: pypi.org/project/tensorflow-gpu
=========================================================
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tensorflow-gpu
Failed to build tensorflow-gpu
Other than the name, the two packages have been identical since TensorFlow 2.1 也就是說安裝2.1版本的已經(jīng)自帶GPU支持。
2.Docker使用GPU
不同型號的GPU及驅(qū)動版本有所區(qū)別,環(huán)境驅(qū)動及CUDA版本如下:
[root@localhost ~]# nvidia-smi
# 查詢結(jié)果
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.27.04 Driver Version: 460.27.04 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
2.1 Could not find cuda drivers
# 報錯
I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
在Docker容器中的程序無法識別CUDA環(huán)境變量,可以嘗試以下步驟來解決這個問題:
- 檢查CUDA版本:首先,需要確認宿主機上已經(jīng)正確安裝了CUDA。在宿主機上運行
nvcc --version
命令來檢查CUDA版本。 - 使用NVIDIA Docker鏡像:NVIDIA提供了一些預(yù)先配置好的Docker鏡像,這些鏡像已經(jīng)包含了CUDA和其他必要的庫。可以使用這些鏡像作為Dockerfile的基礎(chǔ)鏡像。
- 設(shè)置環(huán)境變量:在Dockerfile中,可以使用
ENV
指令來設(shè)置環(huán)境變量。例如,如果CUDA安裝在/usr/local/cuda
目錄下,可以添加以下行到Dockerfile中:ENV PATH /usr/local/cuda/bin:$PATH
。 - 使用nvidia-docker:nvidia-docker是一個用于運行GPU加速的Docker容器的工具。
檢測CUDA版本是必要的,由于使用的是導出的鏡像文件,2和3的方法無法使用,最終使用-e
進行環(huán)境變量設(shè)置:
# 添加cuda的環(huán)境變量
-e PATH=/usr/local/cuda-11.2/bin:$PATH -e LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH
# 啟動命令
nvidia-docker run --name deepface --privileged=true --restart=always --net="host" -e PATH=/usr/local/cuda-11.2/bin:$PATH -e LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH -v /root/.deepface/weights/:/root/.deepface/weights/ -v /usr/local/cuda-11.2/:/usr/local/cuda-11.2/ -d deepface_image
2.2 was unable to find libcuda.so DSO
I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:168] retrieving CUDA diagnostic information for host: localhost.localdomain
I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:175] hostname: localhost.localdomain
I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:199] libcuda reported version is: NOT_FOUND: was unable to find libcuda.so DSO loaded into this program
I tensorflow/compiler/xla/stream_executor/cuda/cuda_diagnostics.cc:203] kernel reported version is: 460.27.4
在Linux環(huán)境下,Docker可以支持將宿主機上的目錄掛載到容器里。這意味著,如果宿主機上的目錄包含軟鏈接,那么這些軟鏈接也會被掛載到容器中。然而,需要注意的是,這些軟鏈接指向的路徑必須在Docker容器中是可訪問的。也就是說,如果軟鏈接指向的路徑?jīng)]有被掛載到Docker容器中,那么在容器中訪問這個軟鏈接可能會失敗。
原文鏈接:https://blog.csdn.net/u013546508/article/details/88637434,當前環(huán)境下問題解決步驟:
# 1.查找 libcuda.so 文件位置
find / -name libcuda.so*
# 查找結(jié)果
/usr/lib/libcuda.so
/usr/lib/libcuda.so.1
/usr/lib/libcuda.so.460.27.04
/usr/lib64/libcuda.so
/usr/lib64/libcuda.so.1
/usr/lib64/libcuda.so.460.27.04
# 2.查看LD_LIBRARY_PATH
echo $LD_LIBRARY_PATH
# 查詢結(jié)果
/usr/local/cuda/lib64
# 3.將64位的libcuda.so.460.27.04復制到LD_LIBRARY_PATH路徑下【libcuda.so和libcuda.so.1都是軟連接】
cp /usr/lib64/libcuda.so.460.27.04 /usr/local/cuda-11.2/lib64/
# 4.創(chuàng)建軟連接
ln -s libcuda.so.460.27.04 libcuda.so.1
ln -s libcuda.so.1 libcuda.so
2.3 Could not find TensorRT&&Cannot dlopen some GPU libraries
I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
W tensorflow/core/common_runtime/gpu/gpu_device.cc:1960] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform.
Skipping registering GPU devices...
這個問題實際上是Docker鏡像文件未安裝TensorRT
導致的,可以在Dockerfile里添加安裝命令后重新構(gòu)建鏡像:
RUN pip install tensorrt -i https://pypi.tuna.tsinghua.edu.cn/simple
以下操作不推薦,進入容器進行安裝:
# 1.查詢?nèi)萜鱅D
docker ps
# 2.在running狀態(tài)進入容器
docker exec -it ContainerID /bin/bash
# 3.安裝軟件
pip install tensorrt -i https://pypi.tuna.tsinghua.edu.cn/simple
# 4.提交新的鏡像【可以將新的鏡像導出使用】
docker commit ContainerID imageName:version
安裝后的現(xiàn)象:
root@localhost:/app# python
Python 3.8.18 (default, Sep 20 2023, 11:41:31)
[GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
# 使用tensorflow報錯
>>> import tensorflow as tf
2023-10-09 10:15:55.482545: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-10-09 10:15:56.498608: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
# 先導入tensorrt后使用tensorflow看我用
>>> import tensorrt as tr
>>> import tensorflow as tf
>>> tf.test.is_gpu_available()
WARNING:tensorflow:From <stdin>:1: is_gpu_available (from tensorflow.python.framework.test_util) is deprecated and will be removed in a future version.
Instructions for updating:
Use `tf.config.list_physical_devices('GPU')` instead.
2023-10-09 10:16:41.452672: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1639] Created device /device:GPU:0 with 11389 MB memory: -> device: 0, name: Tesla T4, pci bus id: 0000:2f:00.0, compute capability: 7.5
True
嘗試解決,在容器啟動要執(zhí)行的py文件內(nèi)加入以下代碼,我將以下代碼加入到app.py
文件內(nèi):
import tensorrt as tr
import tensorflow as tf
if __name__ == "__main__":
available = tf.config.list_physical_devices('GPU')
print(f"available:{available}")
加入代碼后的文件為:
# 3rd parth dependencies
import tensorrt as tr
import tensorflow as tf
from flask import Flask
from routes import blueprint
def create_app():
available = tf.config.list_physical_devices('GPU')
print(f"available:{available}")
app = Flask(__name__)
app.register_blueprint(blueprint)
return app
啟動容器:
nvidia-docker run --name deepface --privileged=true --restart=always --net="host" -e PATH=/usr/local/cuda-11.2/bin:$PATH -e LD_LIBRARY_PATH=/usr/local/cuda-11.2/lib64:$LD_LIBRARY_PATH -v /root/.deepface/weights/:/root/.deepface/weights/ -v /usr/local/cuda-11.2/:/usr/local/cuda-11.2/ -v /opt/xinan-facesearch-service-public/deepface/api/app.py:/app/app.py -d deepface_image
2.4 Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:437] Could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:441] Memory usage: 1100742656 bytes free, 15843721216 bytes total.
E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:451] Possibly insufficient driver version: 460.27.4
W tensorflow/core/framework/op_kernel.cc:1828] OP_REQUIRES failed at conv_ops_impl.h:770 : UNIMPLEMENTED: DNN library is not found.
未安裝cuDNN導致的問題,安裝即可。文章來源:http://www.zghlxwxcb.cn/news/detail-713328.html
2.5 CuDNN library needs to have matching major version and equal or higher minor version
安裝版本跟編譯項目的版本不匹配,調(diào)整版本后成功使用GPU。文章來源地址http://www.zghlxwxcb.cn/news/detail-713328.html
E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:425] Loaded runtime CuDNN library: 8.1.1 but source was compiled with: 8.6.0. CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library. If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
到了這里,關(guān)于Docker【部署 05】docker使用tensorflow-gpu安裝及調(diào)用GPU踩坑記錄的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!