借助So-vits我們可以自己訓(xùn)練五花八門(mén)的音色模型,然后復(fù)刻想要欣賞的任意歌曲,實(shí)現(xiàn)點(diǎn)歌自由,但有時(shí)候卻又總覺(jué)得少了點(diǎn)什么,沒(méi)錯(cuò),缺少了畫(huà)面,只聞其聲,卻不見(jiàn)其人,本次我們讓AI川普的歌聲和他偉岸的形象同時(shí)出現(xiàn),基于PaddleGAN構(gòu)建“靚聲靚影”的“懂王”。
PaddlePaddle是百度開(kāi)源的深度學(xué)習(xí)框架,其功能包羅萬(wàn)象,總計(jì)覆蓋文本、圖像、視頻三大領(lǐng)域40個(gè)模型,可謂是在深度學(xué)習(xí)領(lǐng)域無(wú)所不窺。
PaddleGAN視覺(jué)效果模型中一個(gè)子模塊Wav2lip是對(duì)開(kāi)源庫(kù)Wav2lip的二次封裝和優(yōu)化,它實(shí)現(xiàn)了人物口型與輸入的歌詞語(yǔ)音同步,說(shuō)白了就是能讓靜態(tài)圖的唇部動(dòng)起來(lái),讓人物看起來(lái)仿佛正在唱歌。
除此以外,Wav2lip還可以直接將動(dòng)態(tài)的視頻,進(jìn)行唇形替換,輸出與目標(biāo)語(yǔ)音相匹配的視頻,如此一來(lái),我們就可以通過(guò)AI直接定制屬于自己的口播形象了。
本機(jī)配置CUDA和cudnn
要想把PaddlePaddle框架在本地跑起來(lái),并非易事,但好在有國(guó)內(nèi)深度學(xué)習(xí)領(lǐng)域的巨擘百度進(jìn)行背書(shū),文檔資源非常豐富,只要按部就班,就不會(huì)出太大問(wèn)題。
首先,在本地配置好Python3.10開(kāi)發(fā)環(huán)境,參見(jiàn):一網(wǎng)成擒全端涵蓋,在不同架構(gòu)(Intel x86/Apple m1 silicon)不同開(kāi)發(fā)平臺(tái)(Win10/Win11/Mac/Ubuntu)上安裝配置Python3.10開(kāi)發(fā)環(huán)境
隨后,需要在本地配置好CUDA和cudnn,cudnn是基于CUDA的深度學(xué)習(xí)GPU加速庫(kù),有了它才能在GPU上完成深度學(xué)習(xí)的計(jì)算。它就相當(dāng)于工作的工具,而CUDA作為計(jì)算平臺(tái),就需要cudnn的配合,這倆個(gè)在版本上必須配套。
首先點(diǎn)擊N卡控制中心程序,查看本機(jī)N卡驅(qū)動(dòng)所支持的CUDA版本:
從圖上可知,筆者的顯卡是RTX4060,當(dāng)前驅(qū)動(dòng)最大支持CUDA12.1的版本,換句話(huà)說(shuō)只要是小于等于12.1的CUDA就都是支持的。
隨后查看PaddlePaddle框架的官方文檔,查看Python3.10所支持的框架版本:
https://www.paddlepaddle.org.cn/documentation/docs/zh/install/Tables.html#ciwhls-release
根據(jù)文檔可知,對(duì)于Python3.10來(lái)說(shuō),PaddlePaddle最高的支持版本是win-cuda11.6-cudnn8.4-mkl-vs2017-avx,也就是CUDA的版本是11.6,cudnn的版本是8.4,再高就不支持了。
所以本機(jī)需要安裝CUDA11.6和cudnn8.4。
注意版本一定要吻合,否則后續(xù)無(wú)法啟動(dòng)程序。
知曉了版本號(hào),我們只需要去N卡的官網(wǎng)下載安裝包即可。
CUDA11.6安裝包下載地址:
https://developer.nvidia.com/cuda-toolkit-archive
cudnn8.4安裝包下載地址:
https://developer.nvidia.com/rdp/cudnn-archive
首先安裝CUDA11.6,安裝完成后,解壓cudnn8.4壓縮包,將解壓后的文件拷貝到CUDA11.6安裝目錄中即可,CUDA安裝路徑是:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6
隨后需要將bin目錄添加到系統(tǒng)的環(huán)境變量中:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\bin
接著在終端進(jìn)入demo文件夾:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite
執(zhí)行bandwidthTest.exe命令,返回:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>bandwidthTest.exe
[CUDA Bandwidth Test] - Starting...
Running on...
Device 0: NVIDIA GeForce RTX 4060 Laptop GPU
Quick Mode
Host to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12477.8
Device to Host Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 12337.3
Device to Device Bandwidth, 1 Device(s)
PINNED Memory Transfers
Transfer Size (Bytes) Bandwidth(MB/s)
33554432 179907.9
Result = PASS
NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.
即代表安裝成功,隨后可通過(guò)deviceQuery.exe查詢(xún)GPU設(shè)備:
C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.6\extras\demo_suite>deviceQuery.exe
deviceQuery.exe Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 1 CUDA Capable device(s)
Device 0: "NVIDIA GeForce RTX 4060 Laptop GPU"
CUDA Driver Version / Runtime Version 12.1 / 11.6
CUDA Capability Major/Minor version number: 8.9
Total amount of global memory: 8188 MBytes (8585216000 bytes)
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
MapSMtoCores for SM 8.9 is undefined. Default to use 128 Cores/SM
(24) Multiprocessors, (128) CUDA Cores/MP: 3072 CUDA Cores
GPU Max Clock rate: 2370 MHz (2.37 GHz)
Memory Clock rate: 8001 Mhz
Memory Bus Width: 128-bit
L2 Cache Size: 33554432 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384)
Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers
Total amount of constant memory: zu bytes
Total amount of shared memory per block: zu bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 1536
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: zu bytes
Texture alignment: zu bytes
Concurrent copy and kernel execution: Yes with 1 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
CUDA Device Driver Mode (TCC or WDDM): WDDM (Windows Display Driver Model)
Device supports Unified Addressing (UVA): Yes
Device supports Compute Preemption: Yes
Supports Cooperative Kernel Launch: Yes
Supports MultiDevice Co-op Kernel Launch: No
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.1, CUDA Runtime Version = 11.6, NumDevs = 1, Device0 = NVIDIA GeForce RTX 4060 Laptop GPU
Result = PASS
至此,CUDA和cudnn就配置好了。
配置PaddlePaddle框架
配置好CUDA之后,讓我們來(lái)安裝PaddlePaddle框架:
python -m pip install paddlepaddle-gpu==2.4.2.post116 -f https://www.paddlepaddle.org.cn/whl/windows/mkl/avx/stable.html
這里安裝paddlepaddle的gpu版本,版本號(hào)是2.4.2.post116,2.4是最新版,其中116就代表Cuda的版本,注意版本一定不能弄錯(cuò)。
隨后克隆PaddleGan項(xiàng)目:
git clone https://gitee.com/PaddlePaddle/PaddleGAN
運(yùn)行命令本地編譯安裝PaddleGan項(xiàng)目:
pip install -v -e .
隨后再安裝其他依賴(lài):
pip install -r requirements.txt
這里有幾個(gè)坑,需要說(shuō)明一下:
首先PaddleGan依賴(lài)的numpy庫(kù)還是老版本,它不支持最新的1.24版本,所以如果您的numpy版本是1.24,需要先把numpy卸載了:
pip uninstall numpy
隨后安裝1.21版本:
pip install numpy==1.21
接著在Python終端中驗(yàn)證PaddleGan是否安裝成功:
import paddle
paddle.utils.run_check()
如果報(bào)這個(gè)錯(cuò)誤:
PreconditionNotMetError: The third-party dynamic library (cudnn64_7.dll) that Paddle depends on is not configured correctly. (error code is 126)
Suggestions:
1. Check if the third-party dynamic library (e.g. CUDA, CUDNN) is installed correctly and its version is matched with paddlepaddle you installed.
2. Configure third-party dynamic library environment variables as follows:
- Linux: set LD_LIBRARY_PATH by `export LD_LIBRARY_PATH=...`
- Windows: set PATH by `set PATH=XXX; (at ..\paddle\phi\backends\dynload\dynamic_loader.cc:305)
[operator < fill_constant > error]
則需要下載cudnn64_7.dll動(dòng)態(tài)庫(kù),然后復(fù)制到CUDA11.6的bin目錄中,動(dòng)態(tài)庫(kù)地址后面會(huì)貼出來(lái)。
再次運(yùn)行驗(yàn)證程序,返回:
Python 3.10.11 (tags/v3.10.11:7d4cc5a, Apr 5 2023, 00:38:17) [MSC v.1929 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import paddle
>>> paddle.utils.run_check()
Running verify PaddlePaddle program ...
W0517 20:15:34.881800 31592 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.6
W0517 20:15:34.889958 31592 gpu_resources.cc:91] device: 0, cuDNN Version: 8.4.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
說(shuō)明大功告成,安裝成功。
本地推理
下面我們給川普的歌曲配上動(dòng)態(tài)畫(huà)面,首先通過(guò)Stable-Diffusion生成一張懂王的靜態(tài)圖片:
關(guān)于Stable-Diffusion,請(qǐng)移步:人工智能,丹青圣手,全平臺(tái)(原生/Docker)構(gòu)建Stable-Diffusion-Webui的AI繪畫(huà)庫(kù)教程(Python3.10/Pytorch1.13.0),囿于篇幅,這里不再贅述。
接著進(jìn)入到項(xiàng)目的tools目錄:
\PaddleGAN\applications\tools>
將川普的靜態(tài)圖片和歌曲文件放入tools目錄中。
接著運(yùn)行命令,進(jìn)行本地推理:
python .\wav2lip.py --face .\Trump.jpg --audio test.wav --outfile pp_put.mp4 --face_enhancement
這里--face是目標(biāo)圖片,--audio則是需要匹配唇形的歌曲,--outfile參數(shù)是輸出視頻。
face_enhancement:參數(shù)可以添加人臉增強(qiáng),不添加參數(shù)默認(rèn)為不使用增強(qiáng)功能。
但添加了這個(gè)參數(shù)需要單獨(dú)下載模型文件。
Wav2Lip實(shí)現(xiàn)唇形與語(yǔ)音精準(zhǔn)同步突破的關(guān)鍵在于,它采用了唇形同步判別器,以強(qiáng)制生成器持續(xù)產(chǎn)生準(zhǔn)確而逼真的唇部運(yùn)動(dòng)。此外,它通過(guò)在鑒別器中使用多個(gè)連續(xù)幀而不是單個(gè)幀,并使用視覺(jué)質(zhì)量損失(而不僅僅是對(duì)比損失)來(lái)考慮時(shí)間相關(guān)性,從而改善了視覺(jué)質(zhì)量。
具體效果:文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-448788.html
結(jié)語(yǔ)
有的時(shí)候,人工智能AI技術(shù)的發(fā)展真的會(huì)讓人有一種恍若隔世的感覺(jué),耳聽(tīng)未必為實(shí),眼見(jiàn)也未必為真。最后,成品視頻可在Youtube平臺(tái)(B站)搜索:劉悅的技術(shù)博客,歡迎諸君品鑒,本文所有涉及的安裝包和動(dòng)態(tài)庫(kù)請(qǐng)參見(jiàn):文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-448788.html
https://pan.baidu.com/s/1-6NA2uAOSRlT4O0FGEKUGA?pwd=oo0d
提取碼:oo0d
到了這里,關(guān)于聲音好聽(tīng),顏值能打,基于PaddleGAN給人工智能AI語(yǔ)音模型配上動(dòng)態(tài)畫(huà)面(Python3.10)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!