国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

【Google語音轉(zhuǎn)文字】Speech to Text 超級好用的語音轉(zhuǎn)文本API

2年前作者：優(yōu)小U分類：Toy博客閱讀(65)違法舉報

這篇具有很好參考價值的文章主要介紹了【Google語音轉(zhuǎn)文字】Speech to Text 超級好用的語音轉(zhuǎn)文本API。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

前面有一篇博客說到了訊飛輸入法，支持語音輸入，也支持電腦內(nèi)部音源輸入，詳細參考：【實時語音轉(zhuǎn)文本】PC端實時語音轉(zhuǎn)文本(麥克風(fēng)外音&系統(tǒng)內(nèi)部音源)

但是它只是作為一個工具來使用，如果我們想自己做一些好玩的東西，比如通過語音來控制電腦做一些自動化的操作等，我們先要收集語音轉(zhuǎn)換為文本，然后再通過解析文本來操作平臺，那我們就需要獲取到語音識別的內(nèi)容，通過訊飛輸入法這種就不能辦到了，這時候我們需要使用API來處理，通過對比國內(nèi)外一些大廠的智能語音API，發(fā)現(xiàn)還是Google的API更加【智能】，更加【聽得懂人話】。

說明：因為是使用了Google的API，所以需要具備一定的網(wǎng)絡(luò)環(huán)境，需要能訪問Google。

準(zhǔn)備工作

官方文檔：Cloud Speech-to-Text>文檔>準(zhǔn)備工作

根據(jù)官方文檔一步步設(shè)置就行了，這里簡單說明以下流程：

設(shè)置Google Cloud 項目
確保有一個結(jié)算賬號關(guān)聯(lián)到該項目
啟用 Speech-to-Text API
創(chuàng)建新的服務(wù)賬號
創(chuàng)建JSON密鑰
設(shè)置身份驗證環(huán)境變量

語音文件轉(zhuǎn)文本Python示例

準(zhǔn)備python環(huán)境安裝依賴：

google-cloud-speech==2.16.2
pyaudio==0.2.12
six==1.16.0


if __name__ == "__main__":
    # Imports the Google Cloud client library
    from google.cloud import speech

    import os
    os.environ["http_proxy"] = "http://127.0.0.1:7890"
    os.environ["https_proxy"] = "http://127.0.0.1:7890"

    os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "xxxxx.json"

    # Instantiates a client
    client = speech.SpeechClient()

    # The name of the audio file to transcribe
    gcs_uri = "gs://cloud-samples-data/speech/brooklyn_bridge.raw"

    audio = speech.RecognitionAudio(uri=gcs_uri)

    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=16000,
        language_code="en-US",
    )

    # Detects speech in the audio file
    response = client.recognize(config=config, audio=audio)

    for result in response.results:
        print("Transcript: {}".format(result.alternatives[0].transcript))

控制臺輸出：
google cloud speech-to-text,自由飛翔,語音識別,python,人工智能,speech-to-text

麥克風(fēng)語音轉(zhuǎn)文本Python示例

準(zhǔn)備python環(huán)境安裝依賴：

google-cloud-speech==2.16.2
pyaudio==0.2.12
six==1.16.0

#!/usr/bin/env python

from __future__ import division

import re
import sys

from google.cloud import speech

import pyaudio
from six.moves import queue

import os
os.environ["http_proxy"] = "http://127.0.0.1:7890"
os.environ["https_proxy"] = "http://127.0.0.1:7890"

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = "xxxx.json"

# Audio recording parameters
RATE = 16000
CHUNK = int(RATE / 10)  # 100ms


class MicrophoneStream(object):
    """Opens a recording stream as a generator yielding the audio chunks."""

    def __init__(self, rate, chunk):
        self._rate = rate
        self._chunk = chunk

        # Create a thread-safe buffer of audio data
        self._buff = queue.Queue()
        self.closed = True

    def __enter__(self):
        self._audio_interface = pyaudio.PyAudio()
        self._audio_stream = self._audio_interface.open(
            format=pyaudio.paInt16,
            # The API currently only supports 1-channel (mono) audio
            # https://goo.gl/z757pE
            channels=1,
            rate=self._rate,
            input=True,
            frames_per_buffer=self._chunk,
            # Run the audio stream asynchronously to fill the buffer object.
            # This is necessary so that the input device's buffer doesn't
            # overflow while the calling thread makes network requests, etc.
            stream_callback=self._fill_buffer,
        )

        self.closed = False

        return self

    def __exit__(self, type, value, traceback):
        self._audio_stream.stop_stream()
        self._audio_stream.close()
        self.closed = True
        # Signal the generator to terminate so that the client's
        # streaming_recognize method will not block the process termination.
        self._buff.put(None)
        self._audio_interface.terminate()

    def _fill_buffer(self, in_data, frame_count, time_info, status_flags):
        """Continuously collect data from the audio stream, into the buffer."""
        self._buff.put(in_data)
        return None, pyaudio.paContinue

    def generator(self):
        while not self.closed:
            # Use a blocking get() to ensure there's at least one chunk of
            # data, and stop iteration if the chunk is None, indicating the
            # end of the audio stream.
            chunk = self._buff.get()
            if chunk is None:
                return
            data = [chunk]

            # Now consume whatever other data's still buffered.
            while True:
                try:
                    chunk = self._buff.get(block=False)
                    if chunk is None:
                        return
                    data.append(chunk)
                except queue.Empty:
                    break

            yield b"".join(data)


def listen_print_loop(responses):
    """Iterates through server responses and prints them.

    The responses passed is a generator that will block until a response
    is provided by the server.

    Each response may contain multiple results, and each result may contain
    multiple alternatives; for details, see https://goo.gl/tjCPAU.  Here we
    print only the transcription for the top alternative of the top result.

    In this case, responses are provided for interim results as well. If the
    response is an interim one, print a line feed at the end of it, to allow
    the next result to overwrite it, until the response is a final one. For the
    final one, print a newline to preserve the finalized transcription.
    """
    num_chars_printed = 0
    for response in responses:
        if not response.results:
            continue

        # The `results` list is consecutive. For streaming, we only care about
        # the first result being considered, since once it's `is_final`, it
        # moves on to considering the next utterance.
        result = response.results[0]
        if not result.alternatives:
            continue

        # Display the transcription of the top alternative.
        transcript = result.alternatives[0].transcript

        # Display interim results, but with a carriage return at the end of the
        # line, so subsequent lines will overwrite them.
        #
        # If the previous result was longer than this one, we need to print
        # some extra spaces to overwrite the previous result
        overwrite_chars = " " * (num_chars_printed - len(transcript))

        if not result.is_final:
            sys.stdout.write(transcript + overwrite_chars + "\r")
            sys.stdout.flush()

            num_chars_printed = len(transcript)

        else:
            print(transcript + overwrite_chars)

            # Exit recognition if any of the transcribed phrases could be
            # one of our keywords.
            if re.search(r"\b(exit|quit)\b", transcript, re.I):
                print("Exiting..")
                break

            num_chars_printed = 0


def main():
    # See http://g.co/cloud/speech/docs/languages
    # for a list of supported languages.
    language_code = "zh"  # a BCP-47 language tag

    client = speech.SpeechClient()
    config = speech.RecognitionConfig(
        encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
        sample_rate_hertz=RATE,
        language_code=language_code,
    )

    streaming_config = speech.StreamingRecognitionConfig(
        config=config, interim_results=True
    )

    with MicrophoneStream(RATE, CHUNK) as stream:
        audio_generator = stream.generator()
        requests = (
            speech.StreamingRecognizeRequest(audio_content=content)
            for content in audio_generator
        )

        responses = client.streaming_recognize(streaming_config, requests)

        # Now, put the transcription responses to use.
        listen_print_loop(responses)


if __name__ == "__main__":
    main()

通過麥克風(fēng)語音會實時轉(zhuǎn)為文本輸出，如果需要再對結(jié)果進行處理，可以在listen_print_loop方法中修改。

以上代碼是在官網(wǎng)的示例基礎(chǔ)上做了修改：

設(shè)置代理（國內(nèi)需要設(shè)置http_proxy代理，否則無法訪問到google api）
設(shè)置環(huán)境變量GOOGLE_APPLICATION_CREDENTIALS，正常情況是在客戶端系統(tǒng)設(shè)置里設(shè)置，這里測試可以直接用代碼設(shè)置環(huán)境變量，這個參數(shù)就是準(zhǔn)備工作中的JSON密鑰文件
設(shè)置語言language_code為中文zh，官方支持的語言列表：Speech-to-Text 支持的語言

其他官方示例

Google Cloud 官方示例

Speech-to-Text 示例

電腦內(nèi)部語音

同樣可以將麥克風(fēng)設(shè)置為系統(tǒng)音源，這樣就可以實時將電腦內(nèi)的視頻、語音轉(zhuǎn)為文本，做個實時字幕工具也是不錯的。具體操作方法參考【實時語音轉(zhuǎn)文本】PC端實時語音轉(zhuǎn)文本(麥克風(fēng)外音&系統(tǒng)內(nèi)部音源)，只需要做一點點設(shè)置就行了。文章來源地址http://www.zghlxwxcb.cn/news/detail-783773.html

到了這里，關(guān)于【Google語音轉(zhuǎn)文字】Speech to Text 超級好用的語音轉(zhuǎn)文本API的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費用

文本內(nèi)容轉(zhuǎn)換成語音播放的工具：Speech Mac
Speech Mac版是一款適用于Mac電腦的語音合成工具。它將macOS語音合成器的所有功能整合到一個易于使用的界面中。通過Speech Mac版，用戶可以選擇40多種聲音和語言，方便地將文本轉(zhuǎn)換為語音。用戶可以將文本拖放或粘貼到Speech中，并隨時更改語音和語速。此外，單擊一個單詞即
2024年02月05日
瀏覽(24)
前端開發(fā)中基于Web Speech API（speechSynthesis接口）實現(xiàn)文字轉(zhuǎn)語音功能
一、Web Speech 的概念及用法在開發(fā)業(yè)務(wù)系統(tǒng)時，有時候可能需要使用語音播報一段文字。目前文字轉(zhuǎn)語音即語音合成技術(shù)現(xiàn)在已經(jīng)很成熟了，像百度、訊飛等都提供了相關(guān)的服務(wù)，支持將文字轉(zhuǎn)換成各種形式的語音，通常這些服務(wù)都需要付費使用，如果對語音要求不高，并且
2024年01月24日
瀏覽(86)
【Microsoft Azure 的1024種玩法】五十五.Azure speech service之通過JavaScript快速實現(xiàn)文本轉(zhuǎn)換為語音
文本轉(zhuǎn)語音可使用語音合成標(biāo)記語言 (SSML) 將輸入文本轉(zhuǎn)換為類似人類的合成語音，本篇文檔主要介紹了如何通過JavaScript 的語音SDK實現(xiàn)文本轉(zhuǎn)換為語音的實踐操作【Microsoft Azure 的1024種玩法】一.一分鐘快速上手搭建寶塔管理面板【Microsoft Azure 的1024種玩法】二.基于Azure云平
2024年02月09日
瀏覽(28)
請問哪些好用文字轉(zhuǎn)語音軟件？
好用的文字轉(zhuǎn)語音軟件給大家推薦UU在線工具，這里你可以自由調(diào)節(jié)語速、音調(diào)、音量以及發(fā)音人。播放合成的語音，將音頻導(dǎo)出到本地就可以了。缺點就是生成的音質(zhì)比較單一，只能選擇四款發(fā)音人，無法添加音樂、添加間隔等等。想要應(yīng)對復(fù)雜的配音環(huán)境，給大家推薦知
2024年02月14日
瀏覽(21)
深度學(xué)習(xí)神經(jīng)網(wǎng)絡(luò)學(xué)習(xí)筆記-多模態(tài)方向-11-Deep Voice: Real-time Neural Text-to-Speech
本文提出Deep Voice，一種完全由深度神經(jīng)網(wǎng)絡(luò)構(gòu)建的生產(chǎn)質(zhì)量文本到語音系統(tǒng)。Deep Voice為真正的端到端神經(jīng)語音合成奠定了基礎(chǔ)。該系統(tǒng)由五個主要的構(gòu)建模塊組成:用于定位音素邊界的分割模型、字素到音素的轉(zhuǎn)換模型、音素時長預(yù)測模型、基頻預(yù)測模型和音頻合成模型。對
2024年02月06日
瀏覽(25)
一款非常好用的語音轉(zhuǎn)文字工具介紹
最近發(fā)現(xiàn)一款非常好用的語音轉(zhuǎn)文字的工具Whisper，支持將視頻和語音轉(zhuǎn)換成文字，同時記錄語音的位置信息，支持語言的翻譯，可以將英文轉(zhuǎn)換成中文。同時支持實時的語音自動采集錄制。下面是下載的地址：【免費】視頻、語音轉(zhuǎn)文字Windows版資源-CSDN文庫大家下載好文件
2024年02月02日
瀏覽(21)
【離線文本轉(zhuǎn)語音文件】java spring boot jacob實現(xiàn)文字轉(zhuǎn)語音文件，離線文本轉(zhuǎn)化語音，中英文生成語音，文字朗讀，中文生成聲音，文字生成聲音文件，文字轉(zhuǎn)語音文件，文字變聲音。
輸入文字（支持中英文），點擊轉(zhuǎn)換生成***.wav文件，點擊下載到本地就可。 ?生成后的音頻文件播放,時長1分8秒 ? ? ? ? ?這次采用jacob實現(xiàn)，相比百度AI需要聯(lián)網(wǎng)，本項目定位內(nèi)網(wǎng)環(huán)境實現(xiàn)。所以最終采jacob。 1.環(huán)境配置：本次采用版本jacob-1.19，我們需要下載jacob.jar和dll
2024年02月16日
瀏覽(26)
論文閱讀：VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial
論文標(biāo)題是“ VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design ”，寫不下了，是2023.7.31原vits團隊剛剛掛在arxiv上的文章，主要基于四個方面對vits做了改動，此篇文章我們就不講vits，主要分析vits2的部分。單階段文本到語音模型最近
2024年02月07日
瀏覽(28)
Python GUI設(shè)計——Entry文本框、文字區(qū)域Text
目錄 1.Entry 1.1基本概念 1.2使用show參數(shù)隱藏輸入的字符 1.3Entry的get()方法 1.4Entry的insert()方法 1.5Entry的delete()方法 1.6計算數(shù)學(xué)表達式使用eval() 2.文字區(qū)域Text 2.1基本概念 2.2插入文字insert() 2.3Text加上滾動條Scrollbar設(shè)計 2.4字形 2.4.1family 2.4.2weight 2.4.3size 2.5選取文字 2.6Text的索引 2.
2024年01月18日
瀏覽(11)
whisper 語音識別AI 聲音To文字
Whisper ?是一個由 OpenAI 訓(xùn)練并開源的神經(jīng)網(wǎng)絡(luò)，功能是語音識別,能把語音轉(zhuǎn)換為文字 ,在英語語音識別方面的穩(wěn)健性和準(zhǔn)確性接近人類水平。 1、Whisper支持語音轉(zhuǎn)錄和翻譯兩項功能并接受各種語音格式，模型中、英、法、德、意、日等主流語言上取得85%以上的準(zhǔn)確率，完全
2024年02月08日
瀏覽(96)