語(yǔ)音識(shí)別之百度語(yǔ)音試用和OpenAiGPT開(kāi)源Whisper使用

這篇具有很好參考價(jià)值的文章主要介紹了語(yǔ)音識(shí)別之百度語(yǔ)音試用和OpenAiGPT開(kāi)源Whisper使用。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。

0.前言: 本文作者親自使用了百度云語(yǔ)音識(shí)別,騰訊云,java的SpeechRecognition語(yǔ)言識(shí)別包和OpenAI近期免費(fèi)開(kāi)源的語(yǔ)言識(shí)別Whisper(真香警告)介紹了常見(jiàn)的語(yǔ)言識(shí)別實(shí)現(xiàn)原理

1.NLP 自然語(yǔ)言處理(人類(lèi)語(yǔ)言處理) 你好不同人說(shuō)出來(lái)是不同的信號(hào)表示

  單位k 16k=16000個(gè)數(shù)字表示 1秒16000個(gè)數(shù)字(向量)表示聲音

圖 a a1
語(yǔ)音識(shí)別之百度語(yǔ)音試用和OpenAiGPT開(kāi)源Whisper使用,語(yǔ)言識(shí)別,語(yǔ)音識(shí)別,百度,GPT,Whisper

2.處理的類(lèi)別

   audition-->text
   audition-->audition
   class-->audition(hey siri)

3.深度學(xué)習(xí)帶來(lái)語(yǔ)言的問(wèn)題一定幾率合成錯(cuò)誤

   發(fā)財(cái)發(fā)財(cái)發(fā)財(cái)
   發(fā)財(cái)發(fā)財(cái) //語(yǔ)氣又不一樣
   發(fā)財(cái)  //只有發(fā)

語(yǔ)言分割(兩個(gè)人同時(shí)說(shuō)話(huà))
(電信詐騙)語(yǔ)氣聲調(diào)模仿

4.怎么辨識(shí)

  word 一拳超人  一拳 超人   一拳超 人   
               personal computer
   morpheme 根             unbreakable的break
   bytes 不同語(yǔ)言按01標(biāo)識(shí), language independent
   grapheme

5.常用的模型

LAS 提取范圍feature decoder->attention 相鄰信息差不多,不能事實(shí)翻譯

CTC sequence to sequence 可實(shí)時(shí)輸出圖ctc 好null好null棒棒>棒–>好棒
要自己制作label null null好棒好 null好棒

RNN-T sequence to sequence 如果前面結(jié)果滿(mǎn)意就處理next
圖rnnt/1 解決自己train的label,窗口移動(dòng)做范圍attention MoChA window 大小動(dòng)態(tài)的變化

HMM: 過(guò)去沒(méi)有深度學(xué)習(xí)的解決方案 ,phoneme 發(fā)音為單位猜概率,tri-phone : what do you
–>do發(fā)音受what和you影響
預(yù)測(cè)下一個(gè)的幾率圖hmm1
圖ctc

圖hmm

6.深度學(xué)習(xí)使用到模型上

Tandem 09年滿(mǎn)大街, 得到訓(xùn)練的語(yǔ)音概率,再放到模型運(yùn)行
DNN-HMM HyBrid 2019(google IBM 5%錯(cuò)誤率)主流 DNN(使用一個(gè)文件)可以訓(xùn)練

對(duì)比圖(not gen代表沒(méi)有路徑可以抵達(dá))
語(yǔ)音識(shí)別之百度語(yǔ)音試用和OpenAiGPT開(kāi)源Whisper使用,語(yǔ)言識(shí)別,語(yǔ)音識(shí)別,百度,GPT,Whisper

7.js可以使用語(yǔ)音識(shí)別(調(diào)用google aip,國(guó)內(nèi)被封需要科學(xué)上網(wǎng))
//真香,不過(guò)(科學(xué)上網(wǎng),再開(kāi)個(gè)node服務(wù)器)公司使用會(huì)不會(huì)有紛爭(zhēng)就不知道了

<!DOCTYPE html>
<html>
<head>
  <meta charset="UTF-8">
  <title>語(yǔ)音識(shí)別示例</title>
</head>
<body>
  <h1>語(yǔ)音識(shí)別示例</h1>
  
  <button id="start-btn">開(kāi)始識(shí)別</button>
  <button id="stop-btn">停止識(shí)別</button>

  <div id="result-div"></div>

  <script>
    // 獲取DOM元素
    const startBtn = document.querySelector('#start-btn');
    const stopBtn = document.querySelector('#stop-btn');
    const resultDiv = document.querySelector('#result-div');

    // 創(chuàng)建一個(gè)SpeechRecognition對(duì)象
    const recognition = new webkitSpeechRecognition();

    // 設(shè)置語(yǔ)音識(shí)別參數(shù)
    recognition.lang = 'zh-CN'; // 設(shè)置語(yǔ)言為中文
    recognition.continuous = true; // 設(shè)置為連續(xù)模式

    // 開(kāi)始語(yǔ)音識(shí)別
    startBtn.addEventListener('click', function() {
      recognition.start();
    });

    // 停止語(yǔ)音識(shí)別
    stopBtn.addEventListener('click', function() {
      recognition.stop();
    });

    // 監(jiān)聽(tīng)語(yǔ)音識(shí)別結(jié)果
    recognition.onresult = function(event) {
      const result = event.results[event.resultIndex][0].transcript;
      resultDiv.innerHTML += `<p>${result}</p>`;
    };

    // 監(jiān)聽(tīng)語(yǔ)音識(shí)別錯(cuò)誤
    recognition.onerror = function(event) {
      console.error('語(yǔ)音識(shí)別錯(cuò)誤：', event.error);
    };
  </script>
</body>
</html>

使用SpeechRecognition 沒(méi)有中文包,識(shí)別英文全是oh

9.百度云語(yǔ)音識(shí)別(能識(shí)別就是沒(méi)有說(shuō)話(huà)的時(shí)候出現(xiàn)奇奇怪怪的句子) 免費(fèi)半年還挺好的,騰訊云只有5000次調(diào)用試用

https://console.bce.baidu.com/ai/#/ai/speech/app/list

//圖baidu
//識(shí)別語(yǔ)音的文件,controller只需要得到io流放到byte數(shù)據(jù)就可以識(shí)別,我覺(jué)得每次生成一個(gè)pcm應(yīng)該就不會(huì)出現(xiàn)下圖的識(shí)別識(shí)別的情況

import java.io.File;
import java.io.FileInputStream;
import java.util.HashMap;

import com.baidu.aip.speech.AipSpeech;
import org.json.JSONObject;

public class test01 {

    // 在百度 AI 平臺(tái)創(chuàng)建應(yīng)用后獲得
    private static final String APP_ID = "xxxx";
    private static final String API_KEY = "xxxx";
    private static final String SECRET_KEY = "xxxxx";

    public static void main(String[] args) throws Exception {
        // 初始化 AipSpeech 客戶(hù)端
        AipSpeech client = new AipSpeech(APP_ID, API_KEY, SECRET_KEY);

        // 設(shè)置請(qǐng)求參數(shù)
        HashMap<String, Object> options = new HashMap<String, Object>();
        options.put("dev_pid", 1537); // 普通話(huà)(支持簡(jiǎn)單的英文識(shí)別)

        // 讀取音頻文件
        File file = new File("path/to/audio/file.pcm");
        FileInputStream fis = new FileInputStream(file);
        byte[] data = new byte[(int) file.length()];
        fis.read(data);
        fis.close();

        // 調(diào)用語(yǔ)音識(shí)別 API
        JSONObject result = client.asr(data, "pcm", 16000, options);
        if (result.getInt("err_no") == 0) {
            String text = result.getJSONArray("result").getString(0);
            System.out.println("識(shí)別結(jié)果：" + text);
        } else {
            System.out.println("識(shí)別失?。? + result.getString("err_msg"));
        }
    }
}

//實(shí)時(shí)錄音測(cè)試
//圖baidu

//優(yōu)化需要像圖片處理一樣,直接上傳文件而不是流

import java.util.HashMap;
import javax.sound.sampled.*;

import com.baidu.aip.speech.AipSpeech;
import org.json.JSONObject;

public class test01 {

    // 在百度 AI 平臺(tái)創(chuàng)建應(yīng)用后獲得
    private static final String APP_ID = "xxxxxxx";
    private static final String API_KEY = "xxxxxx";
    private static final String SECRET_KEY = "xxxxxx";

    public static void main(String[] args) throws Exception {
        // 初始化 AipSpeech 客戶(hù)端
        AipSpeech client = new AipSpeech(APP_ID, API_KEY, SECRET_KEY);

        // 設(shè)置請(qǐng)求參數(shù)
        HashMap<String, Object> options = new HashMap<String, Object>();
        options.put("dev_pid", 1537); // 普通話(huà)(支持簡(jiǎn)單的英文識(shí)別)

        // 獲取麥克風(fēng)錄制的音頻流
        AudioFormat format = new AudioFormat(16000, 16, 1, true, false);
        TargetDataLine line = AudioSystem.getTargetDataLine(format);
        line.open(format);
        line.start();

        // 創(chuàng)建緩沖區(qū)讀取音頻數(shù)據(jù)
        int bufferSize = (int) format.getSampleRate() * format.getFrameSize();
        byte[] buffer = new byte[bufferSize];

        // 循環(huán)讀取并識(shí)別音頻數(shù)據(jù)
        while (true) {
            int count = line.read(buffer, 0, buffer.length);
            if (count > 0) {
                // 調(diào)用語(yǔ)音識(shí)別 API
                JSONObject result = client.asr(buffer, "pcm", 16000, options);
                if (result.getInt("err_no") == 0) {
                    String text = result.getJSONArray("result").getString(0);
                    System.out.println("識(shí)別結(jié)果：" + text);
                } else {
                    System.out.println("識(shí)別失?。? + result.getString("err_msg"));
                }
            }
        }
    }
}

10.騰訊云語(yǔ)音識(shí)別 5000條免費(fèi),讀者可以自己下載項(xiàng)目看看

  //控制臺(tái)
   https://console.cloud.tencent.com/asr#
 //項(xiàng)目地址
 https://github.com/TencentCloud/tencentcloud-speech-sdk-java

11.使用whisper(2022年9月21日開(kāi)源的,openAI格局真的大,騰訊云實(shí)時(shí)識(shí)別都要1個(gè)小時(shí)2塊錢(qián)不過(guò)也不貴,但是對(duì)于大多數(shù)公司來(lái)說(shuō)要壓縮成本,嵌入式也有tiny版本的模型來(lái)使用)

安裝python3.10

pip3 install torch torchvision torchaudio

2.powershell安裝coco和ffmpeg

 Set-ExecutionPolicy Bypass -Scope Process -Force; [System.Net.ServicePointManager]::SecurityProtocol = [System.Net.ServicePointManager]::SecurityProtocol -bor 3072; iex ((New-Object System.Net.WebClient).DownloadString('https://community.chocolatey.org/install.ps1'))

//切換阿里源,找不到ffmpeg(專(zhuān)門(mén)來(lái)處理音頻的)如果不安裝就找不到路徑和文件

choco source add --name=aliyun-choco-source --source=https://mirrors.aliyun.com/chocolatey/
choco source set --name="'aliyun-choco-source'"
choco source list
choco install ffmpeg

2.測(cè)試速度挺快的,用小一點(diǎn)的模型豈不是慢一定可以通過(guò)準(zhǔn)確又快速的半實(shí)時(shí)語(yǔ)言識(shí)別!!!

whisper test1.mp4

結(jié)果
語(yǔ)音識(shí)別之百度語(yǔ)音試用和OpenAiGPT開(kāi)源Whisper使用,語(yǔ)言識(shí)別,語(yǔ)音識(shí)別,百度,GPT,Whisper 文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-770891.html

到了這里，關(guān)于語(yǔ)音識(shí)別之百度語(yǔ)音試用和OpenAiGPT開(kāi)源Whisper使用的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！