前面有一篇博客說到了訊飛輸入法,支持語音輸入,也支持電腦內(nèi)部音源輸入,詳細參考:【實時語音轉(zhuǎn)文本】PC端實時語音轉(zhuǎn)文本(麥克風(fēng)外音&系統(tǒng)內(nèi)部音源)
但是它只是作為一個工具來使用,如果我們想自己做一些好玩的東西,比如通過語音來控制電腦做一些自動化的操作等,我們先要收集語音轉(zhuǎn)換為文本,然后再通過解析文本來操作平臺,那我們就需要獲取到語音識別的內(nèi)容,通過訊飛輸入法這種就不能辦到了,這時候我們需要使用API來處理,通過對比國內(nèi)外一些大廠的智能語音API,發(fā)現(xiàn)還是Google的API更加【智能】,更加【聽得懂人話】。
說明:因為是使用了Google的API,所以需要具備一定的網(wǎng)絡(luò)環(huán)境,需要能訪問Google。
準(zhǔn)備工作
官方文檔:Cloud Speech-to-Text>文檔>準(zhǔn)備工作
根據(jù)官方文檔一步步設(shè)置就行了,這里簡單說明以下流程:
- 設(shè)置Google Cloud 項目
- 確保有一個結(jié)算賬號關(guān)聯(lián)到該項目
- 啟用 Speech-to-Text API
- 創(chuàng)建新的服務(wù)賬號
- 創(chuàng)建JSON密鑰
- 設(shè)置身份驗證環(huán)境變量
語音文件轉(zhuǎn)文本Python示例
準(zhǔn)備python環(huán)境安裝依賴:
- google-cloud-speech==2.16.2
- pyaudio==0.2.12
- six==1.16.0
if __name__ == "__main__":
# Imports the Google Cloud client library
from google.cloud import speech
import os
os.environ["http_proxy"] = "http://127.0.0.1:7890"
os.environ["https_proxy"] = "http://127.0.0.1:7890"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "xxxxx.json"
# Instantiates a client
client = speech.SpeechClient()
# The name of the audio file to transcribe
gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"
audio = speech.RecognitionAudio(uri=gcs_uri)
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=16000,
language_code="en-US",
)
# Detects speech in the audio file
response = client.recognize(config=config, audio=audio)
for result in response.results:
print("Transcript: {}".format(result.alternatives[0].transcript))
控制臺輸出:
麥克風(fēng)語音轉(zhuǎn)文本Python示例
準(zhǔn)備python環(huán)境安裝依賴:
- google-cloud-speech==2.16.2
- pyaudio==0.2.12
- six==1.16.0
#!/usr/bin/env python
from __future__ import division
import re
import sys
from google.cloud import speech
import pyaudio
from six.moves import queue
import os
os.environ["http_proxy"] = "http://127.0.0.1:7890"
os.environ["https_proxy"] = "http://127.0.0.1:7890"
os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "xxxx.json"
# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10) # 100ms
class MicrophoneStream(object):
"""Opens a recording stream as a generator yielding the audio chunks."""
def __init__(self, rate, chunk):
self._rate = rate
self._chunk = chunk
# Create a thread-safe buffer of audio data
self._buff = queue.Queue()
self.closed = True
def __enter__(self):
self._audio_interface = pyaudio.PyAudio()
self._audio_stream = self._audio_interface.open(
format=pyaudio.paInt16,
# The API currently only supports 1-channel (mono) audio
# https://goo.gl/z757pE
channels=1,
rate=self._rate,
input=True,
frames_per_buffer=self._chunk,
# Run the audio stream asynchronously to fill the buffer object.
# This is necessary so that the input device's buffer doesn't
# overflow while the calling thread makes network requests, etc.
stream_callback=self._fill_buffer,
)
self.closed = False
return self
def __exit__(self, type, value, traceback):
self._audio_stream.stop_stream()
self._audio_stream.close()
self.closed = True
# Signal the generator to terminate so that the client's
# streaming_recognize method will not block the process termination.
self._buff.put(None)
self._audio_interface.terminate()
def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
"""Continuously collect data from the audio stream, into the buffer."""
self._buff.put(in_data)
return None, pyaudio.paContinue
def generator(self):
while not self.closed:
# Use a blocking get() to ensure there's at least one chunk of
# data, and stop iteration if the chunk is None, indicating the
# end of the audio stream.
chunk = self._buff.get()
if chunk is None:
return
data = [chunk]
# Now consume whatever other data's still buffered.
while True:
try:
chunk = self._buff.get(block=False)
if chunk is None:
return
data.append(chunk)
except queue.Empty:
break
yield b"".join(data)
def listen_print_loop(responses):
"""Iterates through server responses and prints them.
The responses passed is a generator that will block until a response
is provided by the server.
Each response may contain multiple results, and each result may contain
multiple alternatives; for details, see https://goo.gl/tjCPAU. Here we
print only the transcription for the top alternative of the top result.
In this case, responses are provided for interim results as well. If the
response is an interim one, print a line feed at the end of it, to allow
the next result to overwrite it, until the response is a final one. For the
final one, print a newline to preserve the finalized transcription.
"""
num_chars_printed = 0
for response in responses:
if not response.results:
continue
# The `results` list is consecutive. For streaming, we only care about
# the first result being considered, since once it's `is_final`, it
# moves on to considering the next utterance.
result = response.results[0]
if not result.alternatives:
continue
# Display the transcription of the top alternative.
transcript = result.alternatives[0].transcript
# Display interim results, but with a carriage return at the end of the
# line, so subsequent lines will overwrite them.
#
# If the previous result was longer than this one, we need to print
# some extra spaces to overwrite the previous result
overwrite_chars = " " * (num_chars_printed - len(transcript))
if not result.is_final:
sys.stdout.write(transcript + overwrite_chars + "\r")
sys.stdout.flush()
num_chars_printed = len(transcript)
else:
print(transcript + overwrite_chars)
# Exit recognition if any of the transcribed phrases could be
# one of our keywords.
if re.search(r"\b(exit|quit)\b", transcript, re.I):
print("Exiting..")
break
num_chars_printed = 0
def main():
# See http://g.co/cloud/speech/docs/languages
# for a list of supported languages.
language_code = "zh" # a BCP-47 language tag
client = speech.SpeechClient()
config = speech.RecognitionConfig(
encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
sample_rate_hertz=RATE,
language_code=language_code,
)
streaming_config = speech.StreamingRecognitionConfig(
config=config, interim_results=True
)
with MicrophoneStream(RATE, CHUNK) as stream:
audio_generator = stream.generator()
requests = (
speech.StreamingRecognizeRequest(audio_content=content)
for content in audio_generator
)
responses = client.streaming_recognize(streaming_config, requests)
# Now, put the transcription responses to use.
listen_print_loop(responses)
if __name__ == "__main__":
main()
通過麥克風(fēng)語音會實時轉(zhuǎn)為文本輸出,如果需要再對結(jié)果進行處理,可以在listen_print_loop方法中修改。
以上代碼是在官網(wǎng)的示例基礎(chǔ)上做了修改:
- 設(shè)置代理(國內(nèi)需要設(shè)置
http_proxy
代理,否則無法訪問到google api) - 設(shè)置環(huán)境變量
GOOGLE_APPLICATION_CREDENTIALS
,正常情況是在客戶端系統(tǒng)設(shè)置里設(shè)置,這里測試可以直接用代碼設(shè)置環(huán)境變量,這個參數(shù)就是準(zhǔn)備工作中的JSON密鑰文件 - 設(shè)置語言language_code為中文zh,官方支持的語言列表:Speech-to-Text 支持的語言
其他官方示例
Google Cloud 官方示例
Speech-to-Text 示例文章來源:http://www.zghlxwxcb.cn/news/detail-783773.html
電腦內(nèi)部語音
同樣可以將麥克風(fēng)設(shè)置為系統(tǒng)音源,這樣就可以實時將電腦內(nèi)的視頻、語音轉(zhuǎn)為文本,做個實時字幕工具也是不錯的。具體操作方法參考【實時語音轉(zhuǎn)文本】PC端實時語音轉(zhuǎn)文本(麥克風(fēng)外音&系統(tǒng)內(nèi)部音源),只需要做一點點設(shè)置就行了。文章來源地址http://www.zghlxwxcb.cn/news/detail-783773.html
到了這里,關(guān)于【Google語音轉(zhuǎn)文字】Speech to Text 超級好用的語音轉(zhuǎn)文本API的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!