近日,Bert-vits2發(fā)布了最新的版本2.3-final,意為最終版,修復(fù)了一些已知的bug,添加基于 WavLM 的 Discriminator(來(lái)源于 StyleTTS2),令人意外的是,因情感控制效果不佳,去除了 CLAP情感模型,換成了相對(duì)簡(jiǎn)單的 BERT 融合語(yǔ)義方式。
事實(shí)上,經(jīng)過(guò)2.2版本的測(cè)試,CLAP情感模型的效果還是不錯(cuò)的,關(guān)于2.2版本,請(qǐng)移步:
Bert-vits2-v2.2新版本本地訓(xùn)練推理整合包(原神八重神子英文模型miko)
更多情報(bào)請(qǐng)關(guān)注Bert-vits2官網(wǎng):
https://github.com/fishaudio/Bert-VITS2/releases/tag/v2.3
本次我們基于最新版Bert-vits2-2.3來(lái)復(fù)刻生化危機(jī)經(jīng)典角色艾達(dá)王(ada wong)的聲音。
Bert-vits2-2.3項(xiàng)目配置
首先克隆項(xiàng)目:
git clone https://github.com/v3ucn/Bert-vits2-V2.3.git
注意該項(xiàng)目fork自Bert-vits2的2.3分支,在其基礎(chǔ)上增加了素材切分和轉(zhuǎn)寫(xiě)標(biāo)注等功能,更易于使用。
隨后進(jìn)入項(xiàng)目:
cd Bert-vits2-V2.3
安裝依賴:
pip3 install -r requirements.txt
隨后下載對(duì)應(yīng)的模型,首先是bert模型:
https://openi.pcl.ac.cn/Stardust_minus/Bert-VITS2/modelmanage/show_model
放入到bert目錄:
E:\work\Bert-VITS2-2.3\bert>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
│ bert_models.json
│
├───bert-base-japanese-v3
│ .gitattributes
│ config.json
│ README.md
│ tokenizer_config.json
│ vocab.txt
│
├───bert-large-japanese-v2
│ .gitattributes
│ config.json
│ README.md
│ tokenizer_config.json
│ vocab.txt
│
├───chinese-roberta-wwm-ext-large
│ .gitattributes
│ added_tokens.json
│ config.json
│ pytorch_model.bin
│ README.md
│ special_tokens_map.json
│ tokenizer.json
│ tokenizer_config.json
│ vocab.txt
│
├───deberta-v2-large-japanese
│ .gitattributes
│ config.json
│ pytorch_model.bin
│ README.md
│ special_tokens_map.json
│ tokenizer.json
│ tokenizer_config.json
│
├───deberta-v2-large-japanese-char-wwm
│ .gitattributes
│ config.json
│ pytorch_model.bin
│ README.md
│ special_tokens_map.json
│ tokenizer_config.json
│ vocab.txt
│
└───deberta-v3-large
.gitattributes
config.json
generator_config.json
pytorch_model.bin
README.md
spm.model
tokenizer_config.json
注意,其中每個(gè)子目錄中的pytorch_model.bin就是bert模型本體。
隨后還得下載clap模型,雖然推理已經(jīng)把clap去掉了,同時(shí)下載wav2vec2-large-robust-12-ft-emotion-msp-dim模型,放入到項(xiàng)目的emotional目錄:
E:\work\Bert-VITS2-2.3\emotional>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
├───clap-htsat-fused
│ .gitattributes
│ config.json
│ merges.txt
│ preprocessor_config.json
│ pytorch_model.bin
│ README.md
│ special_tokens_map.json
│ tokenizer.json
│ tokenizer_config.json
│ vocab.json
│
└───wav2vec2-large-robust-12-ft-emotion-msp-dim
.gitattributes
config.json
LICENSE
preprocessor_config.json
pytorch_model.bin
README.md
vocab.json
最后下載底模:
https://huggingface.co/OedoSoldier/Bert-VITS2-2.3
放入到角色的models目錄即可。
請(qǐng)注意這次2.3的底模是4個(gè)文件。
Bert-vits2-2.3數(shù)據(jù)預(yù)處理
把艾達(dá)王的語(yǔ)音素材放入到Data/ada/raw目錄中,執(zhí)行切分腳本:
python3 audio_slicer.py
會(huì)切分成小片素材:
E:\work\Bert-VITS2-2.3\Data\ada\raw>tree /f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
ada_0.wav
ada_1.wav
ada_10.wav
ada_11.wav
ada_12.wav
ada_13.wav
ada_14.wav
ada_15.wav
ada_16.wav
ada_17.wav
ada_18.wav
ada_19.wav
ada_2.wav
ada_20.wav
ada_21.wav
ada_22.wav
ada_23.wav
ada_24.wav
ada_25.wav
ada_26.wav
ada_3.wav
ada_4.wav
ada_5.wav
ada_6.wav
ada_7.wav
ada_8.wav
ada_9.wav
隨后運(yùn)行轉(zhuǎn)寫(xiě)和標(biāo)注:
python3 short_audio_transcribe.py
程序返回:
E:\work\Bert-VITS2-2.3\venv\lib\site-packages\whisper\timing.py:58: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.
def backtrace(trace: np.ndarray):
Data/ada/raw
Detected language: en
I do. The kind you like.
Processed: 1/27
Detected language: en
Now where's the amber?
Processed: 2/27
Detected language: en
Leave the girl. She's lost no matter what.
Processed: 3/27
Detected language: en
You walk away now, and who knows?
Processed: 4/27
Detected language: en
Maybe you'll live to meet me again.
Processed: 5/27
Detected language: en
And I might get you that greeting you were looking for.
Processed: 6/27
Detected language: en
How about we continue this discussion another time?
Processed: 7/27
Detected language: en
Sorry, nothing yet.
Processed: 8/27
Detected language: en
But my little helper is creating
Processed: 9/27
Detected language: en
Quite the commotion.
Processed: 10/27
Detected language: en
Everything will work out just fine.
Processed: 11/27
Detected language: en
He's a good boy. Predictable.
Processed: 12/27
Detected language: en
The deal was, we get you out of here when you deliver the amber. No amber, no protection, Louise.
Processed: 13/27
Detected language: en
Nothing personal, Leon.
Processed: 14/27
Detected language: en
Louise and I had an arrangement.
Processed: 15/27
Detected language: en
Don't worry, I'll take good care of it.
Processed: 16/27
Detected language: en
Just one question.
Processed: 17/27
Detected language: en
What are you planning to do with this?
Processed: 18/27
Detected language: en
So, we're talking millions of casualties?
Processed: 19/27
Detected language: en
We're changing course. Now.
Processed: 20/27
Detected language: en
You can stop right there, Leon.
Processed: 21/27
Detected language: en
wouldn't make me use this.
Processed: 22/27
Detected language: en
Would you? You don't seem surprised.
Processed: 23/27
Detected language: en
Interesting.
Processed: 24/27
Detected language: en
Not a bad move
Processed: 25/27
Detected language: en
Very smooth. Ah, Leon.
Processed: 26/27
Detected language: en
You know I don't work and tell.
注意,這里whiper會(huì)報(bào)一個(gè)警告,如果覺(jué)得不好看,可以修改timing.py第58行:
修改前
@numba.jit
def backtrace(trace: np.ndarray):
修改后
@numba.jit(nopython=True)
def backtrace(trace: np.ndarray):
隨后,運(yùn)行web預(yù)處理界面:
python3 webui_preprocess.py
隨后按照頁(yè)面提示操作即可:
至此,數(shù)據(jù)預(yù)處理就結(jié)束了。
Bert-vits2-2.3訓(xùn)練和推理
在根目錄運(yùn)行命令:
python3 train_ms.py
模型會(huì)在models目錄生成:
E:\work\Bert-VITS2-2.3\Data\ada\models>tree/f
Folder PATH listing for volume myssd
Volume serial number is 7CE3-15AE
E:.
G_150.pth
隨后開(kāi)啟推理頁(yè)面進(jìn)行推理即可:
python3 webui.py
新的推理頁(yè)面增加了使用輔助文本的語(yǔ)意來(lái)輔助生成對(duì)話(語(yǔ)言保持與主文本相同),即以提示詞prompt的形式來(lái)定制化生成語(yǔ)音的風(fēng)格。
但又不能使用使用指令式文本(如:開(kāi)心),要使用帶有強(qiáng)烈情感的文本(如:我好快樂(lè)?。。。?/p>
這就導(dǎo)致生成的語(yǔ)音情感風(fēng)格比較玄學(xué):
因?yàn)槟愕貌煌5卣{(diào)整prompt來(lái)測(cè)試效果,不如之前地clap情感的audio prompt來(lái)的直觀,但客觀上講,通過(guò)bert語(yǔ)義文本引導(dǎo)的風(fēng)格化情感語(yǔ)音還是有一定效果的。
結(jié)語(yǔ)
更新Bert-vits2基礎(chǔ)教程的同時(shí),也學(xué)習(xí)到了很多東西,毫無(wú)疑問(wèn),Bert-vits2讓更多的人領(lǐng)略到了深度學(xué)習(xí)的魅力,它是一個(gè)極其優(yōu)秀的人工智能入門項(xiàng)目,興趣永遠(yuǎn)是最好的老師,與各位共勉,最后奉上Bert-vits2-2.3-Final整合包:文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-760753.html
整合包鏈接:https://pan.baidu.com/s/182LZCu5cyR3nH8EoTBLR-g?pwd=v3uc
與眾鄉(xiāng)親同饗。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-760753.html
到了這里,關(guān)于Bert-vits2-2.3-Final,Bert-vits2最終版一鍵整合包(復(fù)刻生化危機(jī)艾達(dá)王)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!