国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<address id="l5dtt"><thead id="l5dtt"></thead></address>

基于whisper的語音轉(zhuǎn)文字（視頻字幕）

2年前作者：Helloorld_1分類：Toy博客閱讀(22)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了基于whisper的語音轉(zhuǎn)文字（視頻字幕）。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

由于之前在學(xué)習(xí)油管的視頻的時(shí)候，發(fā)現(xiàn)沒有字幕，自己的口語聽力又不太好，所以，打算開發(fā)一個(gè)能夠語音或者視頻里面，提取出字幕的軟件。

在尋找了很多的開源倉庫，發(fā)現(xiàn)了openai早期發(fā)布的whisper

原倉庫鏈接如下

openai/whisper: Robust Speech Recognition via Large-Scale Weak Supervision (github.com)https://github.com/openai/whisper首先下載這個(gè)倉庫，解壓后如下圖所示：

基于whisper的語音轉(zhuǎn)文字（視頻字幕）

另外由于，需要對音頻進(jìn)行處理，所以我們還需要下載一個(gè)ffempg

然后解壓，將bin的文件路徑放到環(huán)境變量里面去基于whisper的語音轉(zhuǎn)文字（視頻字幕）

安裝環(huán)境我用的anconda的方式去安裝的，

一鍵部署環(huán)境可以參考我上傳的資源（1積分）

用于whisper的python配置，里面包含environment.yaml文件，可以幫助下載者，快速部署環(huán)境資源-CSDN文庫

使用conda env create -f environment.yaml，就可以快速創(chuàng)建一個(gè)conda的虛擬環(huán)境了！

也可以使用以下方法配置配置：

首先是

pip install -U openai-whisper

然后再安裝

pip install git+https://github.com/openai/whisper.git

希望能幫到大家。里面還包含了一個(gè)python文件運(yùn)行，代碼如下：

import whisper
import io
import time
import os
import json
import pathlib
import torch

# Choose model to use by uncommenting
#modelName = "tiny.en"
#modelName = "base.en"
#modelName = "small.en"
#modelName = "medium.en"
"""在下面這句修改"""
modelName = "model/large-v2.pt"
# device=torch.device('cuda:0'if torch.cuda.is_available() else "cpu")
torch.cuda.empty_cache()
#todo 設(shè)置cpu
device=torch.device("cpu")
# Other Variables
exportTimestampData =False # (bool) Whether to export the segment data to a json file. Will include word level timestamps if word_timestamps is True.
outputFolder = "Output"
exportTimevtt=True

#  ----- Select variables for transcribe method  -----
# audio: path to audio file
verbose = False # (bool): Whether to display the text being decoded to the console. If True, displays all the details, If False, displays minimal details. If None, does not display anything
language="Chinese" # Language of audio file
word_timestamps=False # (bool): Extract word-level timestamps using the cross-attention pattern and dynamic time warping, and include the timestamps for each word in each segment.
#initial_prompt="" # (optional str): Optional text to provide as a prompt for the first window. This can be used to provide, or "prompt-engineer" a context for transcription, e.g. custom vocabularies or proper nouns to make it more likely to predict those word correctly.

#  -------------------------------------------------------------------------
print(f"Using Model: {modelName}")
# filePath = input("Path to File Being Transcribed: ")
# filePath = filePath.strip("\"")

filePath = r"F:\CloudMusic\1.mp3"
if not os.path.exists(filePath):
	print("Problem Getting File...")
	input("Press Enter to Exit...")
	exit()

# If output folder does not exist, create it
if not os.path.exists(outputFolder):
	os.makedirs(outputFolder)
	print("Created Output Folder.\n")

# Get filename stem using pathlib (filename without extension)
fileNameStem = pathlib.Path(filePath).stem

vttFileName=f"{fileNameStem}.vtt"
resultFileName = f"{fileNameStem}.txt"
jsonFileName = f"{fileNameStem}.json"

model = whisper.load_model(modelName,device)
start = time.time()

#  ---------------------------------------------------
result = model.transcribe(audio=filePath, language=language, word_timestamps=word_timestamps, verbose=verbose,fp16=False)#將一段MP3分割成多段30s的間隔的視頻
#  ---------------------------------------------------

end = time.time()
elapsed = float(end - start)#總的時(shí)間
print(result["segments"]) # 保存為.srt文件
# Save transcription text to file
print("\nWriting transcription to file...")
with open(os.path.join(outputFolder, resultFileName), "w", encoding="utf-8") as file:
	file.write(result["text"])
print("Finished writing transcription file.")

# Save the segments data to json file
#if word_timestamps == True:
if exportTimestampData == True:
	print("\nWriting segment data to file...")
	with open(os.path.join(outputFolder, jsonFileName), "w", encoding="utf-8") as file:
		segmentsData = result["segments"]
		json.dump(segmentsData, file, indent=4)
	print("Finished writing segment data file.")
if exportTimevtt==True:
	print("\nWriting segment data to vtt file...")
	with open(os.path.join(outputFolder, vttFileName), "w", encoding="utf-8") as f:
		# 寫入第一行
		# f.write("WEBVTT\n\n")
		# 遍歷字典中的每個(gè)提示
		for cue in result["segments"]:
			# 獲取開始時(shí)間和結(jié)束時(shí)間，并轉(zhuǎn)換成vtt格式
			start = cue["start"]
			end = cue["end"]
			start_h = int(start // 3600)
			start_m = int((start % 3600) // 60)
			start_s = int(start % 60)
			start_ms = int((start % 1) * 1000)
			end_h = int(end // 3600)
			end_m = int((end % 3600) // 60)
			end_s = int(end % 60)
			end_ms = int((end % 1) * 1000)
			start_str = f"{start_h:02}:{start_m:02}:{start_s:02}.{start_ms:03}"
			end_str = f"{end_h:02}:{end_m:02}:{end_s:02}.{end_ms:03}"
			# 獲取文本內(nèi)容，并去掉空格和換行符
			text = cue["text"].strip().replace("\n", " ")
			# 寫入時(shí)間標(biāo)記和文本內(nèi)容，并加上空行
			f.write(f"{start_str} --> {end_str}\n")
			f.write(f"{text}\n\n")
	print("Finished writing segment vtt data file.")

elapsedMinutes = str(round(elapsed/60, 2))
print(f"\nElapsed Time With {modelName} Model: {elapsedMinutes} Minutes")

# input("Press Enter to exit...")
exit()

上述可以根據(jù)自己需要修改cpu,gpu來運(yùn)行。

還需要下載模型，是可以在倉庫鏈接里面可以找到的！

方式一、可以修改上面的代碼，為large-v2.pt就會開始下載模型，默認(rèn)是下載到C:\Users\Lenovo\.cache\whisper這個(gè)文件夾下面的。

方式二、還可以就是利用cmd命令，（在當(dāng)前目錄下，打開conda的python環(huán)境）

然后輸入以下指令

whisper audio.mp3 audio.wav --model base --model_dir 指定模型下載路徑

經(jīng)過測試進(jìn)行了測試，可以實(shí)現(xiàn)中文，英文的語音識別，另外還測試了mp4和mp3的語音識別。

在whisper的基礎(chǔ)上進(jìn)行延伸的exe（非原創(chuàng)），效果如下：

基于whisper的語音轉(zhuǎn)文字（視頻字幕）

初始化，配置模型位置的界面

? 基于whisper的語音轉(zhuǎn)文字（視頻字幕）

?這個(gè)是音頻轉(zhuǎn)文字的界面

基于whisper的語音轉(zhuǎn)文字（視頻字幕）

?這個(gè)是麥克風(fēng)輸入，轉(zhuǎn)文字的界面。

這個(gè)exe的文件，我上傳到csdn有需要的自取。

whisper的Exe文件資源-CSDN文庫

需要加載模型文件（按照下面?zhèn)}庫鏈接下載模型文件）

whisper.cpp/models at master · ggerganov/whisper.cpp (github.com)文章來源地址http://www.zghlxwxcb.cn/news/detail-482745.html

到了這里，關(guān)于基于whisper的語音轉(zhuǎn)文字（視頻字幕）的文章就介紹完了。如果您還想了解更多內(nèi)容，請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

《AI上字幕》基于openAI研發(fā)的whisper模型，語音（視頻）一鍵轉(zhuǎn)文本/字幕/帶時(shí)間軸/支持多語言/自帶翻譯《桌面版教程》
OpenAI的chatGPT非?；鸨?，其實(shí)OpenAI旗下的另一個(gè)模型實(shí)力也十分強(qiáng)大，它就是開源免費(fèi)的Whisper語音轉(zhuǎn)文本模型，目前為止它是較為頂尖的語音轉(zhuǎn)文本模型當(dāng)前github上也有許多出色的開發(fā)者根據(jù)此模型開發(fā)出桌面版語音轉(zhuǎn)文字應(yīng)用。較為出色的分別是Buzz和WhisperDesktop 支持將多種
2023年04月19日
瀏覽(25)
極速進(jìn)化,光速轉(zhuǎn)錄,C++版本人工智能實(shí)時(shí)語音轉(zhuǎn)文字(字幕/語音識別)Whisper.cpp實(shí)踐
業(yè)界良心OpenAI開源的Whisper模型是開源語音轉(zhuǎn)文字領(lǐng)域的執(zhí)牛耳者，白璧微瑕之處在于無法通過蘋果M芯片優(yōu)化轉(zhuǎn)錄效率，Whisper.cpp 則是 Whisper 模型的 C/C++ 移植版本，它具有無依賴項(xiàng)、內(nèi)存使用量低等特點(diǎn)，重要的是增加了 Core ML 支持，完美適配蘋果M系列芯片。 Whisper.cpp的張量
2024年02月02日
瀏覽(29)
【開源工具】使用Whisper提取視頻、語音的字幕
運(yùn)行 WhisperDesktop.exe , 啟動后加載模型“l(fā)oad model，please wait…”,等待其將模型加載到內(nèi)存。 2.1 下載模型剛開始是沒有模型的，需要到Hugging Face 2 的倉庫里面下載模型并配置相關(guān)路徑 2.1.1 進(jìn)入Hugging Face 2 的倉庫點(diǎn)擊 ggerganov/whisper.cpp 進(jìn)入Hugging Face倉庫 2.1.2 選擇需要下載的模型
2024年02月09日
瀏覽(84)
【開源工具】使用Whisper將提取視頻、語音的字幕
運(yùn)行 WhisperDesktop.exe , 啟動后加載模型“l(fā)oad model，please wait…”,等待其將模型加載到內(nèi)存。 2.1 下載模型剛開始是沒有模型的，需要到Hugging Face 2 的倉庫里面下載模型并配置相關(guān)路徑 2.1.1 進(jìn)入Hugging Face 2 的倉庫點(diǎn)擊 ggerganov/whisper.cpp 進(jìn)入Hugging Face倉庫 2.1.2 選擇需要下載的模型
2024年02月08日
瀏覽(94)
whisper實(shí)踐--基于whisper+pyqt5開發(fā)的語音識別翻譯生成字幕工具
大家新年快樂，事業(yè)生活蒸蒸日上，解封的第一個(gè)年，想必大家都回家過年，好好陪陪家人了吧，這篇文章也是我在老家碼的，還記得上篇我?guī)Т蠹一玖私饬藈hisper，相信大家對whisper是什么，怎么安裝whisper，以及使用都有了一個(gè)認(rèn)識，這次作為新年第一篇文章，我將介紹一
2024年02月01日
瀏覽(27)
將視頻中的語音轉(zhuǎn)換為文字：使用Python實(shí)現(xiàn)自動字幕
在開始之前，我們需要安裝一些庫： ? ?moviepy? ?：用于視頻文件處理 ? ?SpeechRecognition? ?：用于識別語音并將其轉(zhuǎn)換為文本 ? ?pydub? ?：用于音頻文件格式轉(zhuǎn)換 ? ?ffmpeg? ?：音視頻處理工具（需獨(dú)立安裝）你可以使用pip來安裝所需的Python庫：請確保你的系統(tǒng)
2024年04月11日
瀏覽(90)
基于whisper模型的在線添加視頻字幕網(wǎng)站（持續(xù)更新）
Whisper 是一個(gè)自動語音識別（ASR，Automatic Speech Recognition）系統(tǒng)，OpenAI 通過從網(wǎng)絡(luò)上收集了 68 萬小時(shí)的多語言（98 種語言）和多任務(wù)（multitask）監(jiān)督數(shù)據(jù)對 Whisper 進(jìn)行了訓(xùn)練。OpenAI 認(rèn)為使用這樣一個(gè)龐大而多樣的數(shù)據(jù)集，可以提高對口音、背景噪音和技術(shù)術(shù)語的識別能力。除
2024年02月03日
瀏覽(57)
實(shí)戰(zhàn)whisper第二天：直播語音轉(zhuǎn)字幕（全部代碼和詳細(xì)部署步驟）
直播語音實(shí)時(shí)轉(zhuǎn)字幕：基于Whisper的實(shí)時(shí)直播語音轉(zhuǎn)錄或翻譯是一項(xiàng)使用OpenAI的Whisper模型實(shí)現(xiàn)的技術(shù)，它能夠?qū)崟r(shí)將直播中的語音內(nèi)容轉(zhuǎn)錄成文本，甚至翻譯成另一種語言。這一過程大致分為三個(gè)步驟：捕獲直播音頻流、語音識別（轉(zhuǎn)錄）以及翻譯（如果需要）。下面詳細(xì)解
2024年04月22日
瀏覽(50)
【C#】Whisper 離線語音識別（微軟曉曉語音合成的音頻）（帶時(shí)間戳、srt字幕）...
語音合成語音識別用微軟語音合成功能生成xiaoxiao的語音。用Whisper離線識別合成的語音輸出srt字幕。一、語音合成參考這個(gè)網(wǎng)址：https://www.bilibili.com/read/cv19064633 合成的音頻：曉曉朗讀-溫柔二、Whisper 語音識別下載模型后放入程序目錄下：請注意，主要示例目前僅使用
2024年02月06日
瀏覽(27)
如何一鍵生成字幕，如何快速處理生肉資源？借助whisper語音識別系統(tǒng)生成.srt字幕文件手把手教學(xué)在Windows、CPU版本下whisper的安裝與使用，快速上手！
Whisper是Open AI開源的語音識別網(wǎng)絡(luò)，支持98中語言，用于語音識別和翻譯等任務(wù)。我們可以將歌曲的歌詞進(jìn)行識別，將無字幕的視頻資源自動生成字母，極大方便了用戶。同時(shí)，whisper可以在本地運(yùn)行，充分保障了個(gè)人隱私。在識別方面也具有較準(zhǔn)確的識別能力。因此想通過本
2024年02月02日
瀏覽(55)

<address id="0bjct"></address>

<del id="0bjct"><b id="0bjct"></b></del>

<address id="0bjct"><thead id="0bjct"></thead></address>