国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<span id="bjgpi"><meter id="bjgpi"><font id="bjgpi"></font></meter></span>

<span id="bjgpi"><ul id="bjgpi"><font id="bjgpi"></font></ul></span><rp id="bjgpi"><u id="bjgpi"></u></rp>

whisper語音識別部署及WER評價

2年前作者：chococolate分類：Toy博客閱讀(23)違法舉報

這篇具有很好參考價值的文章主要介紹了whisper語音識別部署及WER評價。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

1.whisper部署

詳細(xì)過程可以參照：??

創(chuàng)建項目文件夾
mkdir whisper
cd whisper
conda創(chuàng)建虛擬環(huán)境
conda create -n py310 python=3.10 -c conda-forge -y
安裝pytorch
pip install --pre torch torchvision torchaudio --extra-index-url 
下載whisper
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
安裝相關(guān)包
pip install tqdm
pip install numba
pip install tiktoken==0.3.3
brew install ffmpeg
測試一下whispet是否安裝成功（默認(rèn)識別為中文）
whisper test.wav --model small
#test.wav為自己的測試wav文件，map3也支持 small是指用小模型
whisper識別中文的時候經(jīng)常會輸出繁體，加入一下參數(shù)可以避免：
 whisper test.wav --model small --language zh --initial_prompt "以下是普通話的句子。"
#注意"以下是普通話的句子。"不能隨便修改，只能是這句話才有效果。

2.腳本批量測試

創(chuàng)建test.sh腳本，輸入一下內(nèi)容，可以實現(xiàn)對某一文件夾下的wav文件逐個中文語音識別。

#!/bin/bash
for ((i=0;i<300;i++));do
        file="wav/A13_${i}.wav"
        if [ ! -f "$file" ];then
                break
        fi
        whisper "$file" --model medium --output_dir denied --language zh --initial_prompt "以下是普通話的句子。"
done

?實現(xiàn)英文語音識別需要修改為：

#!/bin/bash
for ((i=0;i<300;i++));do
        file="en/${i}.wav"
        if [ ! -f "$file" ];then
                break
        fi
        whisper "$file" --model small --output_dir denied --language en
done

3.對運(yùn)行出來的結(jié)果進(jìn)行評測

一般地，語音識別通常采用WER，即詞錯誤率，評估語音識別和文本轉(zhuǎn)換質(zhì)量。

這里我們主要采用 github上的開源項目：???編寫的python-wer代碼對結(jié)果進(jìn)行評價。

其中，我們的正確樣本形式為：

?whisper輸出的預(yù)測結(jié)果形式為：

?因此要對文本進(jìn)行處理（去空格、去標(biāo)點符號）后進(jìn)行wer評價，相關(guān)代碼如下：

（可根據(jù)具體情況修改calculate_WER）

import sys
import numpy

def editDistance(r, h):
    '''
    This function is to calculate the edit distance of reference sentence and the hypothesis sentence.
    Main algorithm used is dynamic programming.
    Attributes: 
        r -> the list of words produced by splitting reference sentence.
        h -> the list of words produced by splitting hypothesis sentence.
    '''
    d = numpy.zeros((len(r)+1)*(len(h)+1), dtype=numpy.uint8).reshape((len(r)+1, len(h)+1))
    for i in range(len(r)+1):
        d[i][0] = i
    for j in range(len(h)+1):
        d[0][j] = j
    for i in range(1, len(r)+1):
        for j in range(1, len(h)+1):
            if r[i-1] == h[j-1]:
                d[i][j] = d[i-1][j-1]
            else:
                substitute = d[i-1][j-1] + 1
                insert = d[i][j-1] + 1
                delete = d[i-1][j] + 1
                d[i][j] = min(substitute, insert, delete)
    return d

def getStepList(r, h, d):
    '''
    This function is to get the list of steps in the process of dynamic programming.
    Attributes: 
        r -> the list of words produced by splitting reference sentence.
        h -> the list of words produced by splitting hypothesis sentence.
        d -> the matrix built when calulating the editting distance of h and r.
    '''
    x = len(r)
    y = len(h)
    list = []
    while True:
        if x == 0 and y == 0: 
            break
        elif x >= 1 and y >= 1 and d[x][y] == d[x-1][y-1] and r[x-1] == h[y-1]: 
            list.append("e")
            x = x - 1
            y = y - 1
        elif y >= 1 and d[x][y] == d[x][y-1]+1:
            list.append("i")
            x = x
            y = y - 1
        elif x >= 1 and y >= 1 and d[x][y] == d[x-1][y-1]+1:
            list.append("s")
            x = x - 1
            y = y - 1
        else:
            list.append("d")
            x = x - 1
            y = y
    return list[::-1]

def alignedPrint(list, r, h, result):
    '''
    This funcition is to print the result of comparing reference and hypothesis sentences in an aligned way.
    
    Attributes:
        list   -> the list of steps.
        r      -> the list of words produced by splitting reference sentence.
        h      -> the list of words produced by splitting hypothesis sentence.
        result -> the rate calculated based on edit distance.
    '''
    print("REF:", end=" ")
    for i in range(len(list)):
        if list[i] == "i":
            count = 0
            for j in range(i):
                if list[j] == "d":
                    count += 1
            index = i - count
            print(" "*(len(h[index])), end=" ")
        elif list[i] == "s":
            count1 = 0
            for j in range(i):
                if list[j] == "i":
                    count1 += 1
            index1 = i - count1
            count2 = 0
            for j in range(i):
                if list[j] == "d":
                    count2 += 1
            index2 = i - count2
            if len(r[index1]) < len(h[index2]):
                print(r[index1] + " " * (len(h[index2])-len(r[index1])), end=" ")
            else:
                print(r[index1], end=" "),
        else:
            count = 0
            for j in range(i):
                if list[j] == "i":
                    count += 1
            index = i - count
            print(r[index], end=" "),
    print("\nHYP:", end=" ")
    for i in range(len(list)):
        if list[i] == "d":
            count = 0
            for j in range(i):
                if list[j] == "i":
                    count += 1
            index = i - count
            print(" " * (len(r[index])), end=" ")
        elif list[i] == "s":
            count1 = 0
            for j in range(i):
                if list[j] == "i":
                    count1 += 1
            index1 = i - count1
            count2 = 0
            for j in range(i):
                if list[j] == "d":
                    count2 += 1
            index2 = i - count2
            if len(r[index1]) > len(h[index2]):
                print(h[index2] + " " * (len(r[index1])-len(h[index2])), end=" ")
            else:
                print(h[index2], end=" ")
        else:
            count = 0
            for j in range(i):
                if list[j] == "d":
                    count += 1
            index = i - count
            print(h[index], end=" ")
    print("\nEVA:", end=" ")
    for i in range(len(list)):
        if list[i] == "d":
            count = 0
            for j in range(i):
                if list[j] == "i":
                    count += 1
            index = i - count
            print("D" + " " * (len(r[index])-1), end=" ")
        elif list[i] == "i":
            count = 0
            for j in range(i):
                if list[j] == "d":
                    count += 1
            index = i - count
            print("I" + " " * (len(h[index])-1), end=" ")
        elif list[i] == "s":
            count1 = 0
            for j in range(i):
                if list[j] == "i":
                    count1 += 1
            index1 = i - count1
            count2 = 0
            for j in range(i):
                if list[j] == "d":
                    count2 += 1
            index2 = i - count2
            if len(r[index1]) > len(h[index2]):
                print("S" + " " * (len(r[index1])-1), end=" ")
            else:
                print("S" + " " * (len(h[index2])-1), end=" ")
        else:
            count = 0
            for j in range(i):
                if list[j] == "i":
                    count += 1
            index = i - count
            print(" " * (len(r[index])), end=" ")
    print("\nWER: " + result)
    return result

def wer(r, h):
    """
    This is a function that calculate the word error rate in ASR.
    You can use it like this: wer("what is it".split(), "what is".split()) 
    """
    # build the matrix
    d = editDistance(r, h)

    # find out the manipulation steps
    list = getStepList(r, h, d)

    # print the result in aligned way
    result = float(d[len(r)][len(h)]) / len(r) * 100
    result = str("%.2f" % result) + "%"
    result=alignedPrint(list, r, h, result)
    return result

# 計算總WER
def calculate_WER():
    with open("whisper_out.txt", "r") as f:
        text1_list = [i[11:].strip("\n") for i in f.readlines()]
    with open("A13.txt", "r") as f:
        text2_orgin_list = [i[11:].strip("\n") for i in f.readlines()]

    total_distance = 0
    total_length = 0
    WER=0
    symbols = ",@#￥%……&*（）——+~！{}【】；‘：“”‘。？》《、"
    # calculate distance between each pair of texts
    for i in range(len(text1_list)):
        match1 = re.search('[\u4e00-\u9fa5]', text1_list[i])
        if match1:
            index1 = match1.start()
        else:
            index1 = len(text1_list[i])
        match2 = re.search('[\u4e00-\u9fa5]', text2_orgin_list[i])
        if match2:
            index2 = match2.start()
        else:
            index2 = len( text2_orgin_list[i])
        result1=  text1_list[i][index1:]
        result1= result1.translate(str.maketrans('', '', symbols))
        result2=  text2_orgin_list[i][index2:]
        result2=result2.replace(" ", "")
        print(result1)
        print(result2)
        result=wer(result1,result2)
        WER+=float(result.strip('%')) / 100
    WER=WER/len(text1_list)
    print("總WER：", WER)
    print("總WER：", WER.__format__('0.2%'))
calculate_WER()

評價結(jié)果形如：

whisper語音識別部署及WER評價

4.與paddlespeech的測試對比：

數(shù)據(jù)集

數(shù)據(jù)量

paddle

(中英文分開)

paddle

(同一模型)

whisper(small)

（同一模型）

whisper(medium)

（同一模型）

zhthchs30

(中文錯字率)

250

11.61%

45.53%

24.11%

13.95%

LibriSpeech

(英文錯字率)

125

7.76%

50.88%

9.31%

9.31%

5.測試所用數(shù)據(jù)集

自己處理過的開源wav數(shù)據(jù)文章來源地址http://www.zghlxwxcb.cn/news/detail-496316.html

到了這里，關(guān)于whisper語音識別部署及WER評價的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進(jìn)行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

本地部署_語音識別工具_(dá)Whisper
1 簡介 Whisper 是 OpenAI 的語音識別系統(tǒng)（幾乎是最先進(jìn)），它是免費(fèi)的開源模型，可供本地部署。 2 docker https://hub.docker.com/r/onerahmet/openai-whisper-asr-webservice 3 github https://github.com/ahmetoner/whisper-asr-webservice 4 運(yùn)行 image 大小：11.5G 運(yùn)行后，即可在9000端口通過swagger調(diào)用，我先用手機(jī)錄
2024年02月05日
瀏覽(24)
學(xué)習(xí)實踐-Whisper語音識別模型實戰(zhàn)（部署+運(yùn)行）
OpenAI的語音識別模型Whisper，Whisper 是一個自動語音識別（ASR，Automatic Speech Recognition）系統(tǒng)，OpenAI 通過從網(wǎng)絡(luò)上收集了 68 萬小時的多語言（98 種語言）和多任務(wù)（multitask）監(jiān)督數(shù)據(jù)對 Whisper 進(jìn)行了訓(xùn)練。OpenAI 認(rèn)為使用這樣一個龐大而多樣的數(shù)據(jù)集，可以提高對口音、背景噪音
2024年02月06日
瀏覽(27)
實戰(zhàn)whisper：本地化部署通用語音識別模型
????????Whisper 是一種通用語音識別模型。它是在大量不同音頻數(shù)據(jù)集上進(jìn)行訓(xùn)練的，也是一個多任務(wù)模型，可以執(zhí)行多語言語音識別、語音翻譯和語言識別。 ? ? ? ? 這里呢，我將給出我的一些代碼，來幫助你盡快實現(xiàn)【語音轉(zhuǎn)文字】的服務(wù)部署。 ? ? ? ? 以下是該A
2024年01月18日
瀏覽(29)
開源語音識別faster-whisper部署教程
源碼地址模型下載地址：下載 cuBLAS and cuDNN 在 conda 環(huán)境中創(chuàng)建 python 運(yùn)行環(huán)境激活虛擬環(huán)境安裝 faster-whisper 依賴執(zhí)行完以上步驟后，我們可以寫代碼了說明：更多內(nèi)容歡迎訪問博客對應(yīng)視頻內(nèi)容歡迎訪問視頻
2024年02月04日
瀏覽(18)
OpenAI Whisper 語音識別 API 模型使用 | python 語音識別
OpenAI 除了 ChatGPT 的 GPT3.5 API 更新之外，又推出了一個 Whisper 的語音識別模型。支持96種語言。 Python 安裝 openai 庫后，把需要翻譯的音頻目錄放進(jìn)去，運(yùn)行程序即可生成音頻對應(yīng)的文字。以上。
2024年02月16日
瀏覽(93)
語音識別whisper
Whisper是一個通用的語音識別模型，它使用了大量的多語言和多任務(wù)的監(jiān)督數(shù)據(jù)來訓(xùn)練，能夠在英語語音識別上達(dá)到接近人類水平的魯棒性和準(zhǔn)確性1。Whisper還可以進(jìn)行多語言語音識別、語音翻譯和語言識別等任務(wù)2。Whisper的架構(gòu)是一個簡單的端到端方法，采用了編碼器-解碼器
2024年02月12日
瀏覽(19)
語音識別 - ASR whisper
目錄 1. 簡單介紹 2.?代碼調(diào)用 Introducing Whisper https://openai.com/blog/whisper/ OpenAI 的開源自動語音識別神經(jīng)網(wǎng)絡(luò) whisper 安裝 Python 調(diào)用
2024年02月12日
瀏覽(22)
Whisper 語音識別模型
Whisper 語音識別模型 Whisper 是一種通用的語音識別模型。它是在包含各種音頻的大型數(shù)據(jù)集上訓(xùn)練的，也是一個可以執(zhí)行多語言語音識別、語音翻譯和語言識別的多任務(wù)模型。開源項目地址：https://github.com/openai/whisper Whisper 語音識別模型 Transformer 序列到序列模型針對各種語音
2024年02月16日
瀏覽(25)
python語音識別whisper
一、背景最近想提取一些視頻的字幕，語音文案，研究了一波二、whisper語音識別 Whisper 是一種通用的語音識別模型。它在不同音頻的大型數(shù)據(jù)集上進(jìn)行訓(xùn)練，也是一個多任務(wù)模型，可以執(zhí)行多語言語音識別以及語音翻譯和語言識別。 stable-ts在 OpenAI 的 Whisper 之上修改并添加
2024年02月05日
瀏覽(86)
【語音識別】OpenAI whisper
目錄 1. 簡單介紹 2.?代碼調(diào)用 Introducing Whisper https://openai.com/blog/whisper/ OpenAI 的開源自動語音識別神經(jīng)網(wǎng)絡(luò) whisper 安裝 Python 調(diào)用
2024年02月13日
瀏覽(90)

<rp id="8kw0m"></rp>