国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

百度飛槳PaddleSpeech的簡單使用

2年前作者：fj_changing分類：Toy博客閱讀(37)違法舉報

這篇具有很好參考價值的文章主要介紹了百度飛槳PaddleSpeech的簡單使用。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

PaddleSpeech?是基于飛槳?PaddlePaddle?的語音方向的開源模型庫，用于語音和音頻中的各種關(guān)鍵任務的開發(fā)，包含大量基于深度學習前沿和有影響力的模型，一些典型的應用示例如下：語音識別、語音翻譯 (英譯中)、語音合成、標點恢復等。

我只用到了語音識別(語音轉(zhuǎn)文字)、語音合成(文字轉(zhuǎn)語音)。

安裝

我只在CentOS上用了(虛擬機CentOS Linux release 7.9.2009和云服務器CentOS Linux release 8.5.2111)，因截止到寫這篇文章(2022年11月18日)，官方README中說

我們強烈建議用戶在?Linux?環(huán)境下，3.7?以上版本的?python?上安裝 PaddleSpeech。

linux

yum install gcc gcc-c++ # from https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/docs/source/install_cn.md#linux
pip install paddlepaddle -i https://mirror.baidu.com/pypi/simple? # GPU版去官網(wǎng)看命令
pip install pytest-runner
pip install setuptools_scm # 安裝paddlespeech需要，否則報錯ERROR: Could not find a version that satisfies the requirement setuptools_scm (from versions: none)和ERROR: No matching distribution found for setuptools_scm，且這兩句報錯沒有高亮，而是在高亮的報錯'error: subprocess-exited-with-error'的下面。from https://github.com/PaddlePaddle/PaddleSpeech/issues/2150
pip install paddlespeech -i https://pypi.tuna.tsinghua.edu.cn/simple
從安裝文檔中下載nltk_data并解壓到家目錄，文字轉(zhuǎn)語音需要它?# from https://github.com/PaddlePaddle/PaddleSpeech/issues/2456
yum install libsndfile # 運行若報錯OSError: sndfile library not found和OSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or directory再裝。from https://github.com/PaddlePaddle/PaddleSpeech/issues/2198和https://github.com/PaddlePaddle/PaddleSpeech/issues/440，但這兩個鏈接中的命令不對

指定源是因為安裝文檔中建議的

提示: 我們建議在安裝?paddlepaddle?的時候使用百度源?https://mirror.baidu.com/pypi/simple?，而在安裝?paddlespeech?的時候使用清華源?https://pypi.tuna.tsinghua.edu.cn/simple?。

不過README中沒說要指定源。?

paddlespeech,其他,paddlepaddle,人工智能,python,語音識別 — GPU版，CUDA版本取決于你的顯卡型號

顯卡驅(qū)動的安裝可以看我另一篇文章。

使用

如果你的機器CPU或內(nèi)存不夠，可能運行不起來代碼，終端中能看到進程會被自動結(jié)束掉。

測試語音轉(zhuǎn)文字時，我用手機的錄音機錄了wav音頻，用PaddleSpeech轉(zhuǎn)文字時提示

The sample rate of the input file is not 16000.The program will resample the wav file to 16000.If the result does not meet your expectations，Please input the 16k 16 bit 1 channel wav file.?

它要求音頻文件的采樣率是16000Hz，如果輸入的文件不符合要求，根據(jù)提示按y后，程序會自動將音頻文件調(diào)整成它能識別的樣子，然后給出識別結(jié)果。此時我用的官方的示例代碼，只不過音頻文件是我自己錄的。

from paddlespeech.cli.asr.infer import ASRExecutor
asr = ASRExecutor()
result = asr(audio_file="luyin.wav")
print(result)

我需要把這個功能寫成接口，接口中程序運行時，若輸入的音頻文件不符合要求，用戶是無法用鍵盤進行交互的，導致輸入的音頻無法被轉(zhuǎn)成文字。這就需要提前將音頻文件轉(zhuǎn)成16k 16 bit 1 channel wav，然后將轉(zhuǎn)換后的音頻文件傳給PaddleSpeech。我不知道源碼中有沒有提供可供調(diào)用的轉(zhuǎn)換函數(shù)(因為服務器上只有vim，找代碼看代碼不方便)，我直接用ffmpeg轉(zhuǎn)換了(python執(zhí)行shell命令)，ffmpeg的安裝可以參考這兩個鏈接：CentOS安裝使用ffmpeg - 開普勒醒醒吧 - 博客園 (cnblogs.com)、centos 安裝ffmpeg_qq_duhai的博客-CSDN博客。

也可以直接在這里下載靜態(tài)編譯好的，不用自己解決依賴問題。

ffmpeg -y -i input.wav  -ac 1 -ar 16000  -b:a 16k  output.wav # from https://blog.csdn.net/Ezerbel/article/details/124393431

這個命令輸出的文件的格式，和PaddleSpeech給的示例zh.wav的格式一樣，可以用PotPlayer查看。

接口形式的語音轉(zhuǎn)文字、文字轉(zhuǎn)語音的完整代碼

import os
import random
import time
import json
import base64
import shutil

from paddlespeech.cli.asr.infer import ASRExecutor
from paddlespeech.cli.tts.infer import TTSExecutor
from flask import Flask, request

app=Flask(__name__)
asr = ASRExecutor()  # 初始化成全局變量，防止多次初始化導致顯存不夠 from https://github.com/PaddlePaddle/PaddleSpeech/issues/2881和https://github.com/PaddlePaddle/PaddleSpeech/issues/2908
tts = TTSExecutor()

# 公共函數(shù)，所有接口都能用
def random_string(length=32): # 生成32位隨機字符串，為了生成隨機文件名    
    string='abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
    return ''.join(random.choice(string) for i in range(length))

# 公共函數(shù)，所有接口都能用
def base64_to_audio(audio_base64, folder_name=None):  # 服務器上用folder_name參數(shù)，用于在audio_file_path中拼接路徑，如f'/home/www/card/{folder_name}/'，不同的folder_name對應不同的識別任務(如身份證識別、營業(yè)執(zhí)照識別)，本地測試不用
    audio_base64 = audio_base64.split(',')[-1]
    audio = base64.b64decode(audio_base64)
    audio_file_name = random_string() + '_' + (str(time.time()).split('.')[0])  # 不帶擴展名，因為不知道收到的音頻文件的原始擴展名，手機錄的不一定是什么格式
    audio_file_path = f'/home/python/speech/{folder_name}/' + audio_file_name
    with open(audio_file_path, 'wb') as f:
        f.write(audio)
    return audio_file_path

# 將收到的音頻文件轉(zhuǎn)為16k 16 bit 1 channel的wav文件，16k表示16000Hz的采樣率，16bit不知道是什么
# 若給paddlespeech傳的文件不對，會提示The sample rate of the input file is not 16000.The program will resample the wav file to 16000.If the result does not meet your expectations，Please input the 16k 16 bit 1 channel wav file.所以要提前轉(zhuǎn)換。
def resample_rate(audio_path_input):
    audio_path_output = audio_path_input + '_output' + '.wav'  # 傳入的audio_path_input不帶擴展名，所以后面直接拼接字符串
    command = f'ffmpeg -y -i {audio_path_input}  -ac 1 -ar 16000  -b:a 16k  {audio_path_output}'  # 這個命令輸出的wav文件，格式上和PaddleSpeech在README中給的示例zh.wav(https://paddlespeech.bj.bcebos.com/PaddleAudio/zh.wav，內(nèi)容是'我認為跑步最重要的就是給我?guī)砹松眢w健康')一樣。from https://blog.csdn.net/Ezerbel/article/details/124393431
    command_result = os.system(command)  # from https://blog.csdn.net/liulanba/article/details/115466783
    assert command_result == 0
    if os.path.exists(audio_path_output):
        return audio_path_output
    elif not os.path.exists(audio_path_output):  # ffmpeg輸出的文件不存在，可能是ffmpeg命令沒執(zhí)行完，等1秒(因在虛擬機測試轉(zhuǎn)一個8.46M的MP3需0.48秒)，1秒后若還沒有輸出文件，說明報錯了
        time.sleep(1)
        if os.path.exists(audio_path_output):
            return audio_path_output
        else:
            return None

# 語音轉(zhuǎn)文字
# 只接受POST方法訪問
@app.route("/speechtotext",methods=["POST"])
def speech_to_text():
    audio_file_base64 = request.get_json().get('audio_file_base64')  # 要轉(zhuǎn)為文字的語音文件的base64編碼，開頭含不含'data:audio/wav;base64,'都行
    audio_file_path = base64_to_audio(audio_file_base64, folder_name='speech_to_text/audio_file')  # 存放收到的原始音頻文件

    audio_path_output = resample_rate(audio_path_input=audio_file_path)
    if audio_path_output:
        # asr = ASRExecutor()
        result = asr(audio_file=audio_path_output)  # 會在當前代碼所在文件夾中產(chǎn)生exp/log文件夾，里面是paddlespeech的日志文件，每一次調(diào)用都會生成一個日志文件。記錄這點時的版本號是paddlepaddle==2.3.2，paddlespeech==1.2.0。 from https://github.com/PaddlePaddle/PaddleSpeech/issues/1211
        
        os.remove(audio_file_path)  # 識別成功時刪除收到的原始音頻文件和轉(zhuǎn)換后的音頻文件
        os.remove(audio_path_output)
        # try:
        #     shutil.rmtree('')  # 刪除文件夾，若文件夾不存在會報錯。若需刪除日志文件夾，用這個。from https://blog.csdn.net/a1579990149wqh/article/details/124953746
        # except Exception as e:
        #     pass

        return json.dumps({'code':200, 'msg':'識別成功', 'data':result}, ensure_ascii=False)
    else:
        return json.dumps({'code':400, 'msg':'識別失敗'}, ensure_ascii=False)

# 文字轉(zhuǎn)語音
# 只接受POST方法訪問
@app.route("/texttospeech",methods=["POST"])
def text_to_speech():
    text_str = request.get_json().get('text')  # 要轉(zhuǎn)為語音的文字

    # tts = TTSExecutor()
    audio_file_name = random_string() + '_' + (str(time.time()).split('.')[0]) + '.wav'
    audio_file_path = '/home/python/speech/text_to_speech/audio_file' + audio_file_name
    tts(text=text_str, output=audio_file_path)  # 輸出24k采樣率wav格式音頻。同speech_to_text()中一樣，會在當前代碼所在文件夾中產(chǎn)生exp/log文件夾，里面是paddlespeech的日志文件，每一次調(diào)用都會生成一個日志文件。
    if os.path.exists(audio_file_path):
        with open(audio_file_path, 'rb') as f:
            base64_str = base64.b64encode(f.read()).decode('utf-8')  # 開頭不含'data:audio/wav;base64,'
        
        os.remove(audio_file_path)  # 識別成功時刪除轉(zhuǎn)換后的音頻文件
        # try:
        #     shutil.rmtree('')  # 刪除文件夾，若文件夾不存在會報錯。若需刪除日志文件夾，用這個。from https://blog.csdn.net/a1579990149wqh/article/details/124953746
        # except Exception as e:
        #     pass

        return json.dumps({'code':200, 'msg':'識別成功', 'data':base64_str}, ensure_ascii=False)
    elif not os.path.exists(audio_file_path):
        return json.dumps({'code':400, 'msg':'識別失敗'}, ensure_ascii=False)

if __name__=='__main__':
    app.run(host='127.0.0.1', port=9723)

最后

如果你想調(diào)整語速，可以看請問自己 finetune 的 tts 模型能夠改變語速嗎？ · Issue #2383 · PaddlePaddle/PaddleSpeech · GitHub

如果你用的是GPU版，查看是否調(diào)用了GPU，請問語音合成可以使用GPU進行推理嗎，如果可以應該怎么操作呢？ · Issue #2467 · PaddlePaddle/PaddleSpeech · GitHub，也可以用nvidia-smi命令查看GPU占用情況

如果在使用過程中遇到顯存未釋放，導致顯存不夠，可以看音頻轉(zhuǎn)文字過程中顯存不斷增加，最終 out of memory · Issue #2881 · PaddlePaddle/PaddleSpeech · GitHub

?[TTS]使用gpu合成后顯存未釋放 · Issue #2908 · PaddlePaddle/PaddleSpeech · GitHub文章來源地址http://www.zghlxwxcb.cn/news/detail-779211.html

到了這里，關(guān)于百度飛槳PaddleSpeech的簡單使用的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務，不擁有所有權(quán)，不承擔相關(guān)法律責任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務器費用

飛槳paddlespeech語音喚醒推理C定點實現(xiàn)
前面的文章（飛槳paddlespeech語音喚醒推理C浮點實現(xiàn)）講了飛槳paddlespeech語音喚醒推理的C浮點實現(xiàn)。但是嵌入式設備通常CPU頻率低和memory小，在嵌入式設備上要想流暢的運行語音喚醒功能，通常用的是定點實現(xiàn)。于是我就在浮點實現(xiàn)（把卷積層和相應的batchNormal層合并成一個卷
2024年02月16日
瀏覽(16)
基于飛槳paddlespeech訓練中文喚醒詞模型
飛槳Paddlespeech中的語音喚醒是基于hey_snips數(shù)據(jù)集做的。Hey_snips數(shù)據(jù)集是英文喚醒詞，對于中國人來說，最好是中文喚醒詞。經(jīng)過一番嘗試，我發(fā)現(xiàn)它也能訓練中文喚醒詞，于是我決定訓練一個中文喚醒詞模型。 ? 要訓練中文喚醒詞模型，主要有如下工作要做：找數(shù)據(jù)集，做數(shù)
2024年02月08日
瀏覽(19)
【飛槳PaddleSpeech語音技術(shù)課程】— 語音識別-Deepspeech2
(以下內(nèi)容搬運自飛槳PaddleSpeech語音技術(shù)課程，點擊鏈接可直接運行源碼) Demo實現(xiàn)：https://github.com/PaddlePaddle/PaddleSpeech/blob/develop/demos/automatic_video_subtitiles/ 語音識別(Automatic Speech Recognition, ASR) 是一項從一段音頻中提取出語言文字內(nèi)容的任務。 (出處：DLHLP 李宏毅語音識別課程
2024年02月08日
瀏覽(25)
使用pycharm終端安裝百度飛槳paddlepaddle庫的方法
百度飛槳的公開文件非常少，主要靠AIStudio的說明文檔，但是該文檔沒有給出在pycharm上使用的方法，雖然AIStudio提供了免費編程的云資源，甚至也有GPU免費資源，但是經(jīng)常電腦上是連接不上的。我的建議是用pycharm編寫程序，程序沒問題時或者AIStudio能夠進行程序運行時，再導
2023年04月10日
瀏覽(26)
百度飛槳(PaddlePaddle)-數(shù)字識別
手寫數(shù)字識別任務用于對 0 ~ 9 的十類數(shù)字進行分類，即輸入手寫數(shù)字的圖片，可識別出這個圖片中的數(shù)字。 python -m pip install matplotlib numpy -i https://mirror.baidu.com/pypi/simple python -m pip install paddlepaddle==2.4.2 -i https://pypi.tuna.tsinghua.edu.cn/simple 官網(wǎng)代碼少了 plt.show() # 要加上這句，才
2024年02月03日
瀏覽(23)
百度飛槳(PaddlePaddle)- 張量（Tensor）
張量（Tensor）、標量（scalar）、向量（vector）、矩陣（matrix）飛槳使用張量（Tensor）來表示神經(jīng)網(wǎng)絡中傳遞的數(shù)據(jù) ，Tensor 可以理解為多維數(shù)組，類似于 Numpy 數(shù)組（ndarray）的概念。與 Numpy 數(shù)組相比，Tensor 除了支持運行在 CPU 上，還支持運行在 GPU 及各種 AI 芯片上，以實現(xiàn)
2024年02月03日
瀏覽(30)
百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) 基于 Paddle Serving快速使用（服務化部署 - Docker）
目錄安裝 Docker 安裝 PaddleOCR 安裝準備PaddleServing的運行環(huán)境，模型轉(zhuǎn)換 Paddle Serving pipeline部署重啟測試百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) 預測部署簡介與總覽百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) Paddle Inference 模型推理（離線部署）百度飛槳(Paddl
2024年02月07日
瀏覽(56)
百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) 基于 Paddle Serving快速使用（服務化部署 - CentOS 7）
目錄 Paddle Serving服務化部署實戰(zhàn) 準備預測數(shù)據(jù)和部署環(huán)境環(huán)境準備安裝 PaddlePaddle 2.0 安裝 PaddleOCR 準備PaddleServing的運行環(huán)境，模型轉(zhuǎn)換 Paddle Serving pipeline部署確認工作目錄下文件結(jié)構(gòu)：啟動服務可運行如下命令：測試 Python發(fā)送服務請求： Postman 發(fā)送請求參數(shù)調(diào)整百度飛
2024年02月07日
瀏覽(49)
百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) 預測部署簡介與總覽
百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) 預測部署簡介與總覽百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) Paddle Inference 模型推理（離線部署）百度飛槳(PaddlePaddle) - PP-OCRv3 文字檢測識別系統(tǒng) 基于 Paddle Serving快速使用（服務化部署 - CentOS）百度飛槳(PaddlePaddle) - PP
2024年02月06日
瀏覽(27)
PaddleSpeech 的環(huán)境搭建與使用（windows）
一、環(huán)境搭建準備安裝Anaconda 下載地址：https://www.anaconda.com/download#downloads 進入后根據(jù)自己的電腦系統(tǒng)下載，這是python 3.10版本下載地址，如果想要下載其它版本可進入此鏈接（https://www.python.org/downloads/）下載完成后點擊進行安裝點擊下一步，到這一步時，可以選擇將Anaco
2024年02月07日
瀏覽(9)

<noscript id="nzbgh"><th id="nzbgh"></th></noscript>