.Net 使用OpenAI開源語音識別模型 Whisper
前言
Open AI在2022年9月21日開源了號稱其英文語音辨識能力已達(dá)到人類水準(zhǔn)的 Whisper 神經(jīng)網(wǎng)絡(luò),且它亦支持其它98種語言的自動語音辨識。 Whisper系統(tǒng)所提供的自動語音辨識(Automatic Speech Recognition,ASR)模型是被訓(xùn)練來運行語音辨識與翻譯任務(wù)的,它們能將各種語言的語音變成文本,也能將這些文本翻譯成英文。
whisper的核心功能語音識別,對于大部分人來說,可以幫助我們更快捷的將會議、講座、課堂錄音整理成文字稿;對于影視愛好者,可以將無字幕的資源自動生成字幕,不用再苦苦等待各大字幕組的字幕資源;對于外語口語學(xué)習(xí)者,使用whisper翻譯你的發(fā)音練習(xí)錄音,可以很好的檢驗?zāi)愕目谡Z發(fā)音水平。 當(dāng)然,各大云平臺都提供語音識別服務(wù),但是基本都是聯(lián)網(wǎng)運行,個人隱私安全總是有隱患,而whisper完全不同,whisper完全在本地運行,無需聯(lián)網(wǎng),充分保障了個人隱私,且whisper識別準(zhǔn)確率相當(dāng)高。
Whisper是C++寫的,sandrohanea 對其進(jìn)行了.Net封裝。
本文旨在梳理我在.net web 項目中使用開源語音識別模型Whisper的過程,方便下次翻閱,如對您有所幫助不勝榮幸~
.Net Web 項目版本為:.Net 6.0
安裝Whisper.net包
首先我們在Core項目中安裝Whisper.net包。在NuGet包管理器中搜索并安裝【W(wǎng)hisper.net】和【W(wǎng)hisper.net.Runtime】包,如下圖所示:
注意,我們要找的是【W(wǎng)hisper.net】和【W(wǎng)hisper.net.Runtime】,不是、【W(wǎng)hisperNet】、【W(wǎng)hisper.Runtime】。
下載模型文件
前往Hugging Face下載Whisper的模型文件,一共有 ggml-tiny.bin、ggml-base.bin、ggml-small.bin、ggml-medium.bin、ggml-large.bin 5個模型,文件大小依次變大,識別率也依次變大。此外,【xxx.en.bin】是英文模型,【xxx.bin】支持各國語言。
我們將模型文件放到項目中即可,我這里是放到Web項目的wwwroot下:
新建Whisper幫助類
WhisperHelper.cs
using Whisper.net;
using System.IO;
using System.Collections.Generic;
using Market.Core.Enum;
namespace Market.Core.Util
{
public class WhisperHelper
{
public static List<SegmentData> Segments { get; set; }
public static WhisperProcessor Processor { get; set; }
public WhisperHelper(ASRModelType modelType)
{
if(Segments == null || Processor == null)
{
Segments = new List<SegmentData>();
var binName = "ggml-large.bin";
switch (modelType)
{
case ASRModelType.WhisperTiny:
binName = "ggml-tiny.bin";
break;
case ASRModelType.WhisperBase:
binName = "ggml-base.bin";
break;
case ASRModelType.WhisperSmall:
binName = "ggml-small.bin";
break;
case ASRModelType.WhisperMedium:
binName = "ggml-medium.bin";
break;
case ASRModelType.WhisperLarge:
binName = "ggml-large.bin";
break;
default:
break;
}
var modelFilePath = $"wwwroot/WhisperModel/{binName}";
var factory = WhisperFactory.FromPath(modelFilePath);
var builder = factory.CreateBuilder()
.WithLanguage("zh") //中文
.WithSegmentEventHandler(Segments.Add);
var processor = builder.Build();
Processor = processor;
}
}
/// <summary>
/// 完整的語音識別 單例實現(xiàn)
/// </summary>
/// <returns></returns>
public string FullDetection(Stream speechStream)
{
Segments.Clear();
var txtResult = string.Empty;
//開始識別
Processor.Process(speechStream);
//識別結(jié)果處理
foreach (var segment in Segments)
{
txtResult += segment.Text + "\n";
}
Segments.Clear();
return txtResult;
}
}
}
ModelType.cs
不同的模型名字不一樣,需要用一個枚舉類作區(qū)分:
using System.ComponentModel;
namespace Market.Core.Enum
{
/// <summary>
/// ASR模型類型
/// </summary>
[Description("ASR模型類型")]
public enum ASRModelType
{
/// <summary>
/// ASRT
/// </summary>
[Description("ASRT")]
ASRT = 0,
/// <summary>
/// WhisperTiny
/// </summary>
[Description("WhisperTiny")]
WhisperTiny = 100,
/// <summary>
/// WhisperBase
/// </summary>
[Description("WhisperBase")]
WhisperBase = 110,
/// <summary>
/// WhisperSmall
/// </summary>
[Description("WhisperSmall")]
WhisperSmall = 120,
/// <summary>
/// WhisperMedium
/// </summary>
[Description("WhisperMedium")]
WhisperMedium = 130,
/// <summary>
/// WhisperLarge
/// </summary>
[Description("WhisperLarge")]
WhisperLarge = 140,
/// <summary>
/// PaddleSpeech
/// </summary>
[Description("PaddleSpeech")]
PaddleSpeech = 200,
}
}
后端接受音頻并識別
后端接口接受音頻二進(jìn)制字節(jié)碼,并使用Whisper幫助類進(jìn)行語音識別。
關(guān)鍵代碼如下:
public class ASRModel
{
public string samples { get; set; }
}
/// <summary>
/// 語音識別
/// </summary>
[HttpPost]
[Route("/auth/speechRecogize")]
public async Task<IActionResult> SpeechRecogizeAsync([FromBody] ASRModel model)
{
ResultDto result = new ResultDto();
byte[] wavData = Convert.FromBase64String(model.samples);
model.samples = null; //內(nèi)存回收
// 使用Whisper模型進(jìn)行語音識別
var speechStream = new MemoryStream(wavData);
var whisperManager = new WhisperHelper(model.ModelType);
var textResult = whisperManager.FullDetection(speechStream);
speechStream.Dispose();//內(nèi)存回收
speechStream = null;
wavData = null; //內(nèi)存回收
result.Data = textResult;
return Json(result.OK());
}
前端頁面上傳音頻
前端主要做一個音頻采集的工作,然后將音頻文件轉(zhuǎn)化成二進(jìn)制編碼傳輸?shù)胶蠖薃pi接口中
前端頁面如下:
頁面代碼如下:
@{
Layout = null;
}
@using Karambolo.AspNetCore.Bundling.ViewHelpers
@addTagHelper *, Karambolo.AspNetCore.Bundling
@addTagHelper *, Microsoft.AspNetCore.Mvc.TagHelpers
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<title>語音錄制</title>
<meta name="viewport" content="width=device-width, user-scalable=no, initial-scale=1.0, maximum-scale=1.0, minimum-scale=1.0">
<environment names="Development">
<link href="~/content/plugins/element-ui/index.css" rel="stylesheet" />
<script src="~/content/plugins/jquery/jquery-3.4.1.min.js"></script>
<script src="~/content/js/matomo.js"></script>
<script src="~/content/js/slick.min.js"></script>
<script src="~/content/js/masonry.js"></script>
<script src="~/content/js/instafeed.min.js"></script>
<script src="~/content/js/headroom.js"></script>
<script src="~/content/js/readingTime.min.js"></script>
<script src="~/content/js/script.js"></script>
<script src="~/content/js/prism.js"></script>
<script src="~/content/js/recorder-core.js"></script>
<script src="~/content/js/wav.js"></script>
<script src="~/content/js/waveview.js"></script>
<script src="~/content/js/vue.js"></script>
<script src="~/content/plugins/element-ui/index.js"></script>
<script src="~/content/js/request.js"></script>
</environment>
<environment names="Stage,Production">
@await Styles.RenderAsync("~/bundles/login.css")
@await Scripts.RenderAsync("~/bundles/login.js")
</environment>
<style>
html,
body {
margin: 0;
height: 100%;
}
body {
padding: 20px;
box-sizing: border-box;
}
audio {
display:block;
}
audio + audio {
margin-top: 20px;
}
.el-textarea .el-textarea__inner {
color: #000 !important;
font-size: 18px;
font-weight: 600;
}
#app {
height: 100%;
}
.content {
height: calc(100% - 130px);
overflow: auto;
}
.content > div {
margin: 10px 0 20px;
}
.press {
height: 40px;
line-height: 40px;
border-radius: 5px;
border: 1px solid #dcdfe6;
cursor: pointer;
width: 100%;
text-align: center;
background: #fff;
}
</style>
</head>
<body>
<div id="app">
<div style="display: flex; justify-content: space-between; align-items: center;">
<center>{{isPC? '我是電腦版' : '我是手機(jī)版'}}</center>
<center style="margin: 10px 0">
<el-radio-group v-model="modelType">
<el-radio :label="0">ASRT</el-radio>
<el-radio :label="100">WhisperTiny</el-radio>
<el-radio :label="110">WhisperBase</el-radio>
<el-radio :label="120">WhisperSmall</el-radio>
<el-radio :label="130">WhisperMedium</el-radio>
<el-radio :label="140">WhisperLarge</el-radio>
<el-radio :label="200">PaddleSpeech</el-radio>
</el-radio-group>
</center>
<el-button type="primary" size="small" onclick="window.location.href = '/'">返回</el-button>
</div>
<div class="content" id="wav_pannel">
@*{{textarea}}*@
</div>
<div style="margin-top: 20px"></div>
<center style="height: 40px;"><h4 id="msgbox" v-if="messageSatuts">{{message}}</h4></center>
<button class="press" v-on:touchstart="start" v-on:touchend="end" v-if="!isPC">
按住 說話
</button>
<button class="press" v-on:mousedown="start" v-on:mouseup="end" v-else>
按住 說話
</button>
</div>
</body>
</html>
<script>
var blob_wav_current;
var rec;
var recOpen = function (success) {
rec = Recorder({
type: "wav",
sampleRate: 16000,
bitRate: 16,
onProcess: (buffers, powerLevel, bufferDuration, bufferSampleRate, newBufferIdx, asyncEnd) => {
}
});
rec.open(() => {
success && success();
}, (msg, isUserNotAllow) => {
app.textarea = (isUserNotAllow ? "UserNotAllow," : "") + "無法錄音:" + msg;
});
};
var app = new Vue({
el: '#app',
data: {
textarea: '',
message: '',
messageSatuts: false,
modelType: 0,
},
computed: {
isPC() {
var userAgentInfo = navigator.userAgent;
var Agents = ["Android", "iPhone", "SymbianOS", "Windows Phone", "iPod", "iPad"];
var flag = true;
for (var i = 0; i < Agents.length; i++) {
if (userAgentInfo.indexOf(Agents[i]) > 0) {
flag = false;
break;
}
}
return flag;
}
},
methods: {
start() {
app.message = "正在錄音...";
app.messageSatuts = true;
recOpen(function() {
app.recStart();
});
},
end() {
if (rec) {
rec.stop(function (blob, duration) {
app.messageSatuts = false;
rec.close();
rec = null;
blob_wav_current = blob;
var audio = document.createElement("audio");
audio.controls = true;
var dom = document.getElementById("wav_pannel");
dom.appendChild(audio);
audio.src = (window.URL || webkitURL).createObjectURL(blob);
//audio.play();
app.messageSatuts = false;
app.upload();
}, function (msg) {
console.log("錄音失敗:" + msg);
rec.close();
rec = null;
});
app.message = "錄音停止";
}
},
upload() {
app.message = "正在上傳識別...";
app.messageSatuts = true;
var blob = blob_wav_current;
var reader = new FileReader();
reader.onloadend = function(){
var data = {
samples: (/.+;\s*base64\s*,\s*(.+)$/i.exec(reader.result) || [])[1],
sample_rate: 16000,
channels: 1,
byte_width: 2,
modelType: app.modelType
}
$.post('/auth/speechRecogize', data, function(res) {
if (res.data && res.data.statusCode == 200000) {
app.messageSatuts = false;
app.textarea = res.data.text == '' ? '暫未識別出來,請重新試試' : res.data.text;
} else {
app.textarea = "識別失敗";
}
var dom = document.getElementById("wav_pannel");
var div = document.createElement("div");
div.innerHTML = app.textarea;
dom.appendChild(div);
$('#wav_pannel').animate({ scrollTop: $('#wav_pannel')[0].scrollHeight - $('#wav_pannel')[0].offsetHeight });
})
};
reader.readAsDataURL(blob);
},
recStart() {
rec.start();
},
}
})
</script>
引用
whisper官網(wǎng)
測試離線音頻轉(zhuǎn)文本模型Whisper.net的基本用法
whisper.cpp的github
whisper.net的github文章來源:http://www.zghlxwxcb.cn/news/detail-482489.html
whisper模型下載文章來源地址http://www.zghlxwxcb.cn/news/detail-482489.html
到了這里,關(guān)于.Net 使用OpenAI開源語音識別模型Whisper的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!