Microsoft Azure 認知服務

本文將說明如何建立 Microsoft Azure 認知服務的語音API金鑰，並用兩個簡單的小程式來做到語音輸入轉文字，以及文字轉語音。您可以在 Raspberry Pi 上呼叫來做到各種語音互動的效果。範例皆參考 Microsoft 原廠文件

Microsoft Azure 認知服務一直是我們很愛用的範例，網站互動介面不錯，使用上也不難。申請好金鑰之後透過 Rest API 呼叫就好了。今天要使用的是語音服務。

三年前有做過 LinkIt 7688 結合認知服務 Face API 的專題，請大家參考。

如何在Azure中建立語音API服務

接下來說明如何在Azure中建立語音API服務，有兩種做法：使用 Azure 建立語音服務或申請七天免費金鑰。

請先登入 Azure portal，在左側點選[建立資源]，搜尋[speech]

接著填入基本設定

名稱：例如MySpeechService，這要填入後續程式碼中
訂用帳戶：自行帶出不用填
位置：美國西部，這會影響後續 api server 的名稱。如果選美國中部就會改成 centralus，以此類推
定價層：F0 / S0 -> 請選S0
資源：自訂

最後按建立，稍後一下就會看到建立完成。點選[前往資源]可以看到本服務詳細內容

最後點選本頁的[金鑰]，會看到本服務的兩組金鑰。使用任一組都可以，需要把這組金鑰放在您的程式碼中才能順利呼叫。

申請七天免費金鑰

如果您沒有正式的Azure帳號或只想試玩看看的話，可以申請七天的免費金鑰，使用上與先前的做法都是一樣的。不過，金鑰過期之後就無法再使用了。

電腦端環境安裝

請參考本文在您的電腦端建立一個Anaconda Python 3.7 的虛擬環境。簡述步驟如下：

建立工作資料夾，例如 C:\testAI 或 D:\testAI
安裝 Anaconda Python 3.7 version
程式集 → 開啟Anaconda prompt，建立虛擬環境
完成會看到一個有 testAI名稱的 prompt，後續指令都在這裡輸入

範例1：語音輸入轉文字

本範例會開啟裝置上的麥克風，並把辨識結果顯示在 console。請確認麥克風正常，講點話，系統會把聲音來源以英文轉換為文字，並顯示出來。程式碼請參考本段最後。

python quickstart.py

MS 語音API - 語音輸入轉文字

import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "39aca37122c049dfae2420933131f684", "westus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")


# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed.  The task returns the recognition text as result. 
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query. 
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

if ('hello' in result):
    print("hello")

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
    print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
    print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
    cancellation_details = result.cancellation_details
    print("Speech Recognition canceled: {}".format(cancellation_details.reason))
    if cancellation_details.reason == speechsdk.CancellationReason.Error:
        print("Error details: {}".format(cancellation_details.error_details))

範例2：文字轉語音檔

第二個範例是相反的流程。系統會把您在 console 輸入的文字(英文)，發送到認知服務 server，再存成一個 .wav 檔。後續透過 pyaudio 這類的套件來撥放檔案即可。

python TTSSample.py

MS 語音API - 文字轉語音輸出

'''
After you've set your subscription key, run this application from your working
directory with this command: python TTSSample.py
'''
import os, requests, time
from xml.etree import ElementTree
from playsound import playsound

# This code is required for Python 2.7
try: input = raw_input
except NameError: pass

'''
If you prefer, you can hardcode your subscription key as a string and remove
the provided conditional statement. However, we do recommend using environment
variables to secure your subscription keys. The environment variable is
set to SPEECH_SERVICE_KEY in our sample.

For example:
subscription_key = "Your-Key-Goes-Here"
'''

if 'SPEECH_SERVICE_KEY' in os.environ:
    subscription_key = os.environ['SPEECH_SERVICE_KEY']
else:
    print('Environment variable for your subscription key is not set.')
    exit()

class TextToSpeech(object):
    def __init__(self, subscription_key):
        self.subscription_key = subscription_key
        self.tts = input("What would you like to convert to speech: ")
        self.timestr = time.strftime("%Y%m%d-%H%M")
        self.access_token = None

    '''
    The TTS endpoint requires an access token. This method exchanges your
    subscription key for an access token that is valid for ten minutes.
    '''
    def get_token(self):
        fetch_token_url = "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken"
        headers = {
            'Ocp-Apim-Subscription-Key': self.subscription_key
        }
        response = requests.post(fetch_token_url, headers=headers)
        self.access_token = str(response.text)

    def save_audio(self):
        base_url = 'https://westus.tts.speech.microsoft.com/'
        path = 'cognitiveservices/v1'
        constructed_url = base_url + path
        headers = {
            'Authorization': 'Bearer ' + self.access_token,
            'Content-Type': 'application/ssml+xml',
            'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm',
            'User-Agent': 'YOUR_RESOURCE_NAME'
        }
        xml_body = ElementTree.Element('speak', version='1.0')
        xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')
        voice = ElementTree.SubElement(xml_body, 'voice')
        voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
        voice.set('name', 'en-US-Guy24kRUS') # Short name for 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)'
        voice.text = self.tts
        body = ElementTree.tostring(xml_body)

        response = requests.post(constructed_url, headers=headers, data=body)
        '''
        If a success response is returned, then the binary audio is written
        to file in your working directory. It is prefaced by sample and
        includes the date.
        '''
        if response.status_code == 200:
            with open('sample-001.wav', 'wb') as audio:
            #with open('sample-' + self.timestr + '.wav', 'wb') as audio:
                audio.write(response.content)
                print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
        else:
            print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")

if __name__ == "__main__":
    app = TextToSpeech(subscription_key)
    app.get_token()
    app.save_audio()
    playsound('sample-001.wav')

Post Views: 283

Sign up for Newsletter

Microsoft Azure

Microsoft Azure 認知服務 – 語音 API