国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<noscript id="srceg"></noscript>

基于 Wav2Lip-GFPGAN 深度學(xué)習(xí)模型的數(shù)字人Demo

2年前作者：山河已無恙分類：Toy博客閱讀(27)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了基于 Wav2Lip-GFPGAN 深度學(xué)習(xí)模型的數(shù)字人Demo。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

寫在前面

工作中遇到簡單整理
博文為 Wav2Lip-GFPGAN 環(huán)境搭建運(yùn)行的 Demo
理解不足小伙伴幫忙指正

對(duì)每個(gè)人而言，真正的職責(zé)只有一個(gè)：找到自我。然后在心中堅(jiān)守其一生，全心全意，永不停息。所有其它的路都是不完整的，是人的逃避方式，是對(duì)大眾理想的懦弱回歸，是隨波逐流，是對(duì)內(nèi)心的恐懼 ——赫爾曼·黑塞《德米安》

Demo簡單介紹

Wav2Lip-GAN

Wav2Lip-GAN 是一種基于生成對(duì)抗網(wǎng)絡(luò)（GAN）的語音到唇形的轉(zhuǎn)換模型。https://github.com/Rudrabha/Wav2Lip

基本原理是使用語音信號(hào)和人臉圖像來訓(xùn)練一個(gè)生成器網(wǎng)絡(luò)，該網(wǎng)絡(luò)可以將輸入的語音信號(hào)轉(zhuǎn)換為對(duì)應(yīng)的唇形。

該模型包括兩個(gè)子網(wǎng)絡(luò)：

一個(gè)是語音識(shí)別網(wǎng)絡(luò)，用于將語音信號(hào)轉(zhuǎn)換為文本；
另一個(gè)是唇形生成網(wǎng)絡(luò)，用于將文本和人臉圖像作為輸入，生成對(duì)應(yīng)的唇形。

兩個(gè)網(wǎng)絡(luò)通過GAN框架進(jìn)行訓(xùn)練，以使生成的唇形盡可能地逼真。在測試階段，給定一個(gè)語音信號(hào)和一個(gè)人臉圖像，該模型可以生成一個(gè)與語音信號(hào)相對(duì)應(yīng)的唇形序列，從而實(shí)現(xiàn)語音到唇形的轉(zhuǎn)換。

GFPGAN

騰訊 GFPGAN 是一種基于生成對(duì)抗網(wǎng)絡(luò)（GAN）的圖像超分辨率模型。https://github.com/TencentARC/GFPGAN

基本原理是使用低分辨率的圖像作為輸入，通過生成器網(wǎng)絡(luò)將其轉(zhuǎn)換為高分辨率的圖像。

該模型包括兩個(gè)子網(wǎng)絡(luò)：

一個(gè)是生成器網(wǎng)絡(luò)，用于將低分辨率圖像轉(zhuǎn)換為高分辨率圖像；
另一個(gè)是判別器網(wǎng)絡(luò)，用于評(píng)估生成的圖像是否逼真。

兩個(gè)網(wǎng)絡(luò)通過GAN框架進(jìn)行訓(xùn)練，以使生成的圖像盡可能地接近真實(shí)圖像。在測試階段，給定一個(gè)低分辨率的圖像，該模型可以生成一個(gè)與之對(duì)應(yīng)的高分辨率圖像。騰訊GFPGAN采用了一些創(chuàng)新的技術(shù)，如漸進(jìn)式訓(xùn)練、自適應(yīng)實(shí)例歸一化等，使得其在圖像超分辨率任務(wù)中表現(xiàn)出色。

Demo 來自下面的項(xiàng)目完成，小伙伴可以直接參考。作者提供了一個(gè)ipynb Demo GitHub\Wav2Lip-GFPGAN\Wav2Lip-GFPGAN.ipynb,有基礎(chǔ)小伙伴按照步驟即可完成，下面的就不需要看了

https://github.com/ajay-sainy/Wav2Lip-GFPGAN/

基于 Wav2Lip-GFPGAN 深度學(xué)習(xí)模型的數(shù)字人Demo

有困難的小伙伴可以克隆下面的這個(gè)，fork 了上面的項(xiàng)目，提供了當(dāng)前搭建環(huán)境步驟，需要的素材腳本：

https://github.com/LIRUILONGS/Wav2Lip-GFPGAN_Python_Demo

涉及到的模型和安裝包下載

Wav2Lip

可以在項(xiàng)目中看到下載路徑: https://github.com/Rudrabha/Wav2Lip

Wav2Lip：https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/Eb3LEzbfuKlJiR600lQWRxgBIY27JZg80f7V9jtMfbNDaQ?e=TBFBVW

Wav2Lip + GAN　：https://iiitaphyd-my.sharepoint.com/:u:/g/personal/radrabha_m_research_iiit_ac_in/EdjI7bZlgApMqsVoEUUXpLsBxqXbn5z8VTmoxp55YNDcIA?e=n9ljGW

ffmpeg: https://www.gyan.dev/ffmpeg/builds/ffmpeg-git-essentials.7z ,Linux 環(huán)境直接用包管理工具安裝即可

ffmpeg 裝完之后 win系統(tǒng) 需要配置環(huán)境變量，這里不多講。

GFPGAN

GFPGANv1.3.pth:https://github.com/TencentARC/GFPGAN/releases/download/v1.3.0/GFPGANv1.3.pth

parsing_parsenet.pth:https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth

detection_Resnet50_Final.pth:https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth

環(huán)境安裝

wav2lip 環(huán)境

當(dāng)前系統(tǒng)環(huán)境為 window11,Anaconda3 使用CPU 跑，虛擬環(huán)境創(chuàng)建

C:\Users\liruilong>conda create -n wav2lip python=3.8
C:\Users\liruilong>conda info --envs
# conda environments:
#
base                  *  C:\ProgramData\Anaconda3
myenv                    C:\Users\liruilong\AppData\Local\conda\conda\envs\myenv
wav2lip                  C:\Users\liruilong\AppData\Local\conda\conda\envs\wav2lip

切換虛擬環(huán)境的時(shí)候，報(bào)錯(cuò)了

C:\Users\liruilong>conda activate wav2lip
.....

后來在Anaconda Prompt (Anaconda3) 可以正常執(zhí)行

(base) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>conda activate wav2lip

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>conda list
.....

安裝 requirements.txt 中的依賴庫，直接安裝報(bào)錯(cuò)了

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>pip install -r requirements.txt   -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
Looking in indexes: http://pypi.douban.com/simple/

需要添加 --use-pep517

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>pip install -r requirements.txt   -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com  --use-pep517
Looking in indexes: http://pypi.douban.com/simple/

檢測 wav2lip 環(huán)境運(yùn)行Demo 測試一下，當(dāng)前項(xiàng)目預(yù)留了一些素材，這里使用模型wav2lip.pth

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>python .\Wav2Lip-master\inference.py --checkpoint_path .\Wav2Lip-master\checkpoints\wav2lip.pth --face .\inputs\kim_7s_raw.mp4 --audio .\inputs\kim_audio.mp3 --outfile result.mp4
Using cpu for inference.
Reading video frames...
Number of frames available for inference: 223
Extracting raw audio...
...................................
[libx264 @ 0000022caf538200] Weighted P-Frames: Y:1.2% UV:1.2%
[libx264 @ 0000022caf538200] ref P L0: 68.7%  8.6% 16.2%  6.4%
[libx264 @ 0000022caf538200] ref B L0: 75.0% 20.2%  4.8%
[libx264 @ 0000022caf538200] ref B L1: 94.9%  5.1%
[libx264 @ 0000022caf538200] kb/s:1433.66
[aac @ 0000022caf528940] Qavg: 237.868

運(yùn)行完會(huì)在當(dāng)前目錄生成 result.mp4 文件

https://www.bilibili.com/video/BV1fX4y187jW/

然后用模型wav2lip_gan.pth 在試下

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>python .\Wav2Lip-master\inference.py --checkpoint_path  .\inputs\wav2lip_gan.pth --face .\inputs\kim_7s_raw.mp4 --audio .\inputs\kim_audio.mp3 --outfile result.mp4
Using cpu for inference.

https://www.bilibili.com/video/BV1Vo4y1T7F2/

這里 wav2lip 環(huán)境已經(jīng)安裝完成

GFPGAN 環(huán)境

準(zhǔn)備一個(gè)新的音視頻，使用 wav2lip_gan 生成，準(zhǔn)備GFPGAN 環(huán)境

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>python .\Wav2Lip-master\inference.py --checkpoint_path  .\inputs\wav2lip_gan.pth --face .\inputs\demo.mp4 --audio .\inputs\demo_5_y.mp3 --outfile result.mp4
Using cpu for inference.
Reading video frames...
Number of frames available for inference: 2116
Extracting raw audio..
。。。。。。。。。。。。。。。。。。。。。
[libx264 @ 000001ba2a798d80] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 18% 48%  3%  2%  2%  2%  3%  3%
[libx264 @ 000001ba2a798d80] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 23% 22% 17%  6%  6%  6%  6%  7%  8%
[libx264 @ 000001ba2a798d80] i8c dc,h,v,p: 49% 20% 22%  8%
[libx264 @ 000001ba2a798d80] Weighted P-Frames: Y:0.0% UV:0.0%
[libx264 @ 000001ba2a798d80] ref P L0: 80.9% 10.0%  6.6%  2.5%
[libx264 @ 000001ba2a798d80] ref B L0: 87.8% 10.5%  1.7%
[libx264 @ 000001ba2a798d80] ref B L1: 98.7%  1.3%
[libx264 @ 000001ba2a798d80] kb/s:703.37
[aac @ 000001ba2a79a780] Qavg: 170.234

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>

https://www.bilibili.com/video/BV1cX4y1h7k8/

創(chuàng)建一個(gè)結(jié)果文件夾

PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN> mkdir results


    目錄: C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----          2023/6/9      7:14                results


PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>

需要把上面生成的文件移到這個(gè)文件夾里面，然后執(zhí)行下面的腳本

# day1.py

wav2lipFolderName = 'Wav2Lip-master'
gfpganFolderName = 'GFPGAN-master'
wav2lipPath =  '.\\' + wav2lipFolderName
gfpganPath = '.\\' + gfpganFolderName
outputPath = ".\\results"

import cv2
from tqdm import tqdm
from os import path

import os

# 上一步生成的視頻
inputVideoPath = outputPath+'\\result.mp4'
# 中間數(shù)據(jù)
unProcessedFramesFolderPath = outputPath+'\\frames'

if not os.path.exists(unProcessedFramesFolderPath):
  os.makedirs(unProcessedFramesFolderPath)

vidcap = cv2.VideoCapture(inputVideoPath)
numberOfFrames = int(vidcap.get(cv2.CAP_PROP_FRAME_COUNT))
fps = vidcap.get(cv2.CAP_PROP_FPS)
print("FPS: ", fps, "Frames: ", numberOfFrames)

for frameNumber in tqdm(range(numberOfFrames)):
    _,image = vidcap.read()
    cv2.imwrite(path.join(unProcessedFramesFolderPath, str(frameNumber).zfill(4)+'.jpg'), image)

print("unProcessedFramesFolderPath:",unProcessedFramesFolderPath)
print("inputVideoPath:",inputVideoPath)

作用是將wav2lip處理的視頻按幀數(shù)逐幀讀取，將每一幀保存為 JPEG 格式的圖片，并將這些圖片保存到指定的文件夾 unProcessedFramesFolderPath 中

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN>python day1.py
FPS:  25.0 Frames:  1793
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1793/1793 [00:10<00:00, 166.99it/s]
unProcessedFramesFolderPath:  
inputVideoPath: .\results\result.mp4

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN>

之后會(huì)在 .\results\frames 看到切好的照片

現(xiàn)在準(zhǔn)備 GFPGAN-master 的環(huán)境

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master>pip install -r requirements.txt -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com --use-pep517
Looking in indexes: http://pypi.douban.com/simple/
..........
Installing collected packages: numpy, scikit-image
  Attempting uninstall: numpy
    Found existing installation: numpy 1.23.5
    Uninstalling numpy-1.23.5:
      Successfully uninstalled numpy-1.23.5
  Attempting uninstall: scikit-image
    Found existing installation: scikit-image 0.20.0
    Uninstalling scikit-image-0.20.0:
      Successfully uninstalled scikit-image-0.20.0
Successfully installed numpy-1.20.3 scikit-image-0.19.3

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master>

GFPGANv1.3.pth 模型放到 /experiments/pretrained_models 目錄下

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master>mkdir -p .\\experiments\pretrained_models

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master>cd  .\\experiments\pretrained_models

確認(rèn)模型

    目錄: C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master\experiments\pretrained_models


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a----          2023/6/7      1:43      348632874 GFPGANv1.3.pth

之后執(zhí)行下面的命令

python inference_gfpgan.py -i $unProcessedFramesFolderPath -o $outputPath -v 1.3 -s 2 --only_center_face --bg_upsampler None

替換對(duì)應(yīng)的變量，如果模型無法下載，需要把前面下載的放到指定位置

(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master>python inference_gfpgan.py -i ..\results\frames -o ..\results -v 1.3 -s 2 --only_center_face --bg_upsampler None
C:\Users\liruilong\AppData\Local\conda\conda\envs\wav2lip\lib\site-packages\torchvision\transforms\functional_tensor.py:5: UserWarning: The torchvision.transforms.functional_tensor module is deprecated in 0.15 and will be **removed in 0.17**. Please don't rely on it. You probably just need to use APIs in torchvision.transforms.functional or in torchvision.transforms.v2.functional.
  warnings.warn(
C:\Users\liruilong\AppData\Local\conda\conda\envs\wav2lip\lib\site-packages\torchvision\models\_utils.py:208: UserWarning: The parameter 'pretrained' is deprecated since 0.13 and may be removed in the future, please use 'weights' instead.
  warnings.warn(
C:\Users\liruilong\AppData\Local\conda\conda\envs\wav2lip\lib\site-packages\torchvision\models\_utils.py:223: UserWarning: Arguments other than a weight enum or `None` for 'weights' are deprecated since 0.13 and may be removed in the future. The current behavior is equivalent to passing `weights=None`.
  warnings.warn(msg)
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.1.0/detection_Resnet50_Final.pth" to C:\Users\liruilong\AppData\Local\conda\conda\envs\wav2lip\lib\site-packages\facexlib\weights\detection_Resnet50_Final.pth

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 104M/104M [00:06<00:00, 16.1MB/s]
Downloading: "https://github.com/xinntao/facexlib/releases/download/v0.2.2/parsing_parsenet.pth" to C:\Users\liruilong\AppData\Local\conda\conda\envs\wav2lip\lib\site-packages\facexlib\weights\parsing_parsenet.pth

100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 81.4M/81.4M [00:05<00:00, 14.8MB/s]
0it [00:00, ?it/s]
  warnings.warn(msg)
  0%|                                                                                                                                                                                  | 0/1793 [00:00<?, ?it/s]Processing 0000.jpg ...
  0%|                                                                                                                                                                        | 1/1793 [00:06<3:18:38,  6.65s/it]Processing 0001.jpg ...
  0%|▏                                                                                                                                                                       | 2/1793 [00:13<3:18:06,  6.64s/it]P
...............................
(wav2lip) C:\Users\liruilong\Documents\GitHub\Wav2Lip-GFPGAN\GFPGAN-master>

OK 跑完之后，需要用處理的圖片合成視頻，執(zhí)行下面的腳本



import os


outputPath = ".\\results"

restoredFramesPath = outputPath + '\\restored_imgs\\'
processedVideoOutputPath = outputPath

dir_list = os.listdir(restoredFramesPath)
dir_list.sort()

import cv2
import numpy as np

batch = 0
batchSize = 300
from tqdm import tqdm
for i in tqdm(range(0, len(dir_list), batchSize)):
  img_array = []
  start, end = i, i+batchSize
  print("processing ", start, end)
  for filename in  tqdm(dir_list[start:end]):
      filename = restoredFramesPath+filename;
      img = cv2.imread(filename)
      if img is None:
        continue
      height, width, layers = img.shape
      size = (width,height)
      img_array.append(img)


  out = cv2.VideoWriter(processedVideoOutputPath+'\\batch_'+str(batch).zfill(4)+'.avi',cv2.VideoWriter_fourcc(*'DIVX'), 30, size)
  batch = batch + 1
 
  for i in range(len(img_array)):
    out.write(img_array[i])
  out.release()

concatTextFilePath = outputPath + "\\concat.txt"
concatTextFile=open(concatTextFilePath,"w")
for ips in range(batch):
  concatTextFile.write("file batch_" + str(ips).zfill(4) + ".avi\n")
concatTextFile.close()

concatedVideoOutputPath = outputPath + "\\concated_output.avi"
print("concatedVideoOutputPath:",concatedVideoOutputPath)

finalProcessedOuputVideo = processedVideoOutputPath+'\\final_with_audio.avi'
print("finalProcessedOuputVideo:",finalProcessedOuputVideo)
# ffmpeg -y -f concat -i {concatTextFilePath} -c copy {concatedVideoOutputPath} 

#ffmpeg -y -i {concatedVideoOutputPath} -i {inputAudioPath} -map 0 -map 1:a -c:v copy -shortest {finalProcessedOuputVideo}

#from google.colab import files
#files.download(finalProcessedOuputVideo)

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>python day2.py
  0%|                                                                                                                                                                                     | 0/6 [00:00<?, ?it/s]processing  0 300

  0%|                                                                                                                                                                                   | 0/300 [00:00<?, ?it/s]
  4%|██████▏                                                                                                                                                                  | 11/300 [00:00<00:02, 107.59it/s]
  7%|███████████▊                                                                                                                                                             | 21/300 [00:00<00:02, 104.49it/s]
 11%|██████████████████
 ...................
 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 293/293 [00:02<00:00, 107.10it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:25<00:00,  4.26s/it]
concatedVideoOutputPath: .\results\concated_output.avi
finalProcessedOuputVideo: .\results\final_with_audio.avi

(wav2lip) C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN>

使用 ffmpeg 合并視頻

PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN> cd .\results\
PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN\results> ffmpeg -y -f concat -i .\concat.txt  -c copy .\concated_output.avi
.....................
frame= 1793 fps=0.0 q=-1.0 Lsize=   24625kB time=00:00:59.76 bitrate=3375.3kbits/s speed=1.76e+03x
video:24577kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.197566%
PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN\results> ls


    目錄: C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN\results


Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
d-----          2023/6/9      7:25                frames
d-----          2023/6/9     11:03                restored_imgs
-a----          2023/6/9     11:42        4231050 batch_0000.avi
-a----          2023/6/9     11:42        4274254 batch_0001.avi
-a----          2023/6/9     11:42        4281898 batch_0002.avi
-a----          2023/6/9     11:42        4165970 batch_0003.avi
-a----          2023/6/9     11:42        4222324 batch_0004.avi
-a----          2023/6/9     11:42        4069836 batch_0005.avi
-a----          2023/6/9     11:42            126 concat.txt
-a----          2023/6/9     11:52       25216450 concated_output.avi
-a----          2023/6/9      7:22        7515594 result.mp4

使用 ffmpeg 合并視頻和音頻

PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN\results> ffmpeg -y -i .\concated_output.avi -i ..\inputs\demo_5_y.mp3  -map 0 -map 1:a -c:v copy -shortest  .\final_with_audio.avi
ffmpeg version git-2020-08-31-4a11a6f Copyright (c) 2000-2020 the FFmpeg developers
........
frame= 1793 fps=699 q=-1.0 Lsize=   25618kB time=00:00:59.76 bitrate=3511.2kbits/s speed=23.3x
video:24577kB audio:934kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: 0.417315%
PS C:\Users\山河已無恙\Documents\GitHub\Wav2Lip-GFPGAN\results>

生成結(jié)果

https://www.bilibili.com/video/BV1914y1U7dH/

關(guān)于 Demo 和小伙伴分享到這里

博文部分內(nèi)容參考

? 文中涉及參考鏈接內(nèi)容版權(quán)歸原作者所有，如有侵權(quán)請(qǐng)告知，這是一個(gè)開源項(xiàng)目，如果你認(rèn)可它，不要吝嗇星星哦 ??

https://github.com/ajay-sainy/Wav2Lip-GFPGAN

? 2018-2023 liruilonger@gmail.com, All rights reserved. 保持署名-非商用-相同方式共享(CC BY-NC-SA 4.0)文章來源地址http://www.zghlxwxcb.cn/news/detail-478006.html

到了這里，關(guān)于基于 Wav2Lip-GFPGAN 深度學(xué)習(xí)模型的數(shù)字人Demo的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

Wav2Lip：深度解析、實(shí)現(xiàn)與應(yīng)用AI數(shù)字人語音驅(qū)動(dòng)模型
Wav2Lip模型作為一種將音頻波形轉(zhuǎn)換為面部動(dòng)畫的深度學(xué)習(xí)技術(shù)的重要性。本文將深入探討Wav2Lip模型的搭建原理、搭建流程以及應(yīng)用場景，以期為讀者提供更具技術(shù)深度的理解。一、Wav2Lip的搭建原理 Wav2Lip模型的搭建基于生成對(duì)抗網(wǎng)絡(luò)（GAN）的原理。GAN由兩個(gè)主要部分組成：
2024年03月09日
瀏覽(47)
Wav2Lip使用教程
提示：基本準(zhǔn)備工作：項(xiàng)目名稱: Wav2Lip git地址： https://github.com/Rudrabha/Wav2Lip.git Python 3.6 語言環(huán)境 ffmpeg: sudo apt-get install ffmpeg 安裝 pip install -r requirements.txt 【系統(tǒng)用要求安裝的依賴包】人臉檢測預(yù)訓(xùn)練模型應(yīng)下載到Face_detection/detection/sfd/s3fd.pth。如果以上內(nèi)容不起作用，請(qǐng)選
2024年02月07日
瀏覽(26)
Wav2Lip視頻人臉口型同步（Win10）
最近比較火的一個(gè)AI場景，就是用原聲講外語，嘴型同步，網(wǎng)上找了些資料，今天也跑起來了，推薦 Wav2Lip，官網(wǎng)地址：Github Wav2Lip Python3.6 ffmpeg git clone https://github.com/Rudrabha/Wav2Lip.git pip install -r requirements.txt 下載人臉檢測模型，并放在路徑 face_detection/detection/sfd/s3fd.pth Wav2Lip模
2024年02月01日
瀏覽(25)
AI數(shù)字人：語音驅(qū)動(dòng)人臉模型Wav2Lip
2020年，來自印度海德拉巴大學(xué)和英國巴斯大學(xué)的團(tuán)隊(duì)，在ACM MM2020發(fā)表了的一篇論文《A Lip Sync Expert Is All You Need for Speech to Lip Generation In The Wild 》，在文章中，他們提出一個(gè)叫做Wav2Lip的AI模型，只需要一段人物視頻和一段目標(biāo)語音，就能夠讓音頻和視頻合二為一，人物嘴型與
2024年02月11日
瀏覽(22)
stable-diffusion-webui安裝Wav2Lip
常見錯(cuò)誤 1.錯(cuò)誤：Torch is not able to use GPU; add --skip-torch-cuda-test to COMMANDLINE_ARGS variable to disable this check 修改代碼：刪除三個(gè)地方：
2024年01月22日
瀏覽(20)
wav2lip：Accurately Lip-syncing Videos In The Wild
飛槳AI Studio - 人工智能學(xué)習(xí)與實(shí)訓(xùn)社區(qū) 集開放數(shù)據(jù)、開源算法、免費(fèi)算力三位一體，為開發(fā)者提供高效學(xué)習(xí)和開發(fā)環(huán)境、高價(jià)值高獎(jiǎng)金競賽項(xiàng)目，支撐高校老師輕松實(shí)現(xiàn)AI教學(xué)，并助力開發(fā)者學(xué)習(xí)交流，加速落地AI業(yè)務(wù)場景 https://aistudio.baidu.com/aistudio/education/group/info/16651 wav
2024年02月06日
瀏覽(26)
AI數(shù)字人主播技術(shù)實(shí)現(xiàn)Wav2Lip【詳細(xì)記錄】
近期很多童鞋私信，想知道關(guān)于AI數(shù)字人主播的技術(shù)實(shí)現(xiàn)?，F(xiàn)本篇就AI數(shù)字人虛擬主播的Wav2Lip技術(shù)進(jìn)行實(shí)現(xiàn)與評(píng)測，后續(xù)還會(huì)有其他的相關(guān)技術(shù)實(shí)現(xiàn)與評(píng)測。本文主要實(shí)現(xiàn) 圖片說話（如下圖的蒙娜麗莎）、視頻融合語音（這里的核心都是人物口型與音頻中的語音唇形同步）
2024年02月11日
瀏覽(35)
AI虛擬主播數(shù)字人技術(shù)實(shí)現(xiàn)Wav2Lip【附完整版教程】及【效果評(píng)測】
前言建議直接閱讀飛書文檔： Docs https://yv2c3kamh3y.feishu.cn/docx/S5AldFeZUoMpU5x8JAuctgPsnfg 近期很多飽子私信，想知道關(guān)于AI數(shù)字人主播的技術(shù)實(shí)現(xiàn)?，F(xiàn)本篇就AI數(shù)字人虛擬主播的Wav2Lip技術(shù)進(jìn)行實(shí)現(xiàn)與評(píng)測，后續(xù)還會(huì)有其他的相關(guān)技術(shù)實(shí)現(xiàn)與評(píng)測。本文主要實(shí)現(xiàn) 圖片說話（如下圖的
2024年02月09日
瀏覽(25)
AI數(shù)字人：語音驅(qū)動(dòng)面部模型及超分辨率重建Wav2Lip-HD
數(shù)字人打造中語音驅(qū)動(dòng)人臉和超分辨率重建兩種必備的模型，它們被用于實(shí)現(xiàn)數(shù)字人的語音和圖像方面的功能。通過Wav2Lip-HD項(xiàng)目可以快速使用這兩種模型，完成高清數(shù)字人形象的打造。項(xiàng)目代碼地址：github地址 1.1 語音驅(qū)動(dòng)面部模型wav2lip 語音驅(qū)動(dòng)人臉技術(shù)主要是通過語音信
2024年02月16日
瀏覽(19)
最新能讓老外對(duì)口型講中文的AI 視頻教程，免費(fèi)開源AI工具——Wav2Lip
本期就來教大家制作海外大佬們新年祝福視頻吧！對(duì)口型視頻一直在全網(wǎng)都非常的火爆，隨便一個(gè)視頻都是幾千贊以上，簡直堪稱漲粉利器！是不是很有意思，口型完全對(duì)得上，表情也很自然逼真，不懂內(nèi)行的人，還真的以為是大佬中文說得非常溜！這種內(nèi)容形態(tài)非常適合
2024年04月25日
瀏覽(24)