Microsoft Azure 認知服務 – 語音 API

本文將說明如何建立 Microsoft Azure 認知服務的語音API金鑰,並用兩個簡單的小程式來做到語音輸入轉文字,以及文字轉語音。您可以在 Raspberry Pi 上呼叫來做到各種語音互動的效果。範例皆參考 Microsoft 原廠文件

Microsoft Azure 認知服務

Microsoft Azure 認知服務一直是我們很愛用的範例,網站互動介面不錯,使用上也不難。申請好金鑰之後透過 Rest API 呼叫就好了。今天要使用的是語音服務

三年前有做過 LinkIt 7688 結合認知服務 Face API 的專題,請大家參考。

如何在Azure中建立語音API服務

接下來說明如何在Azure中建立語音API服務,有兩種做法:使用 Azure 建立語音服務或申請七天免費金鑰。

請先登入 Azure portal,在左側點選[建立資源],搜尋[speech]

接著填入基本設定

 

  • 名稱:例如MySpeechService,這要填入後續程式碼中
  • 訂用帳戶:自行帶出不用填
  • 位置:美國西部,這會影響後續 api server 的名稱。如果選美國中部就會改成 centralus,以此類推
  • 定價層:F0 / S0 -> 請選S0
  • 資源:自訂

最後按建立,稍後一下就會看到建立完成。點選[前往資源]可以看到本服務詳細內容

最後點選本頁的[金鑰],會看到本服務的兩組金鑰。使用任一組都可以,需要把這組金鑰放在您的程式碼中才能順利呼叫。

 

申請七天免費金鑰

如果您沒有正式的Azure帳號或只想試玩看看的話,可以申請七天的免費金鑰,使用上與先前的做法都是一樣的。不過,金鑰過期之後就無法再使用了。

電腦端環境安裝

參考本文在您的電腦端建立一個Anaconda Python 3.7 的虛擬環境。簡述步驟如下:

  1. 建立工作資料夾,例如 C:\testAI 或 D:\testAI
  2. 安裝 Anaconda Python 3.7 version
  3. 程式集 → 開啟Anaconda prompt,建立虛擬環境
  4. 完成會看到一個有 testAI名稱的 prompt,後續指令都在這裡輸入

範例1:語音輸入轉文字

本範例會開啟裝置上的麥克風,並把辨識結果顯示在 console。請確認麥克風正常,講點話,系統會把聲音來源以英文轉換為文字,並顯示出來。程式碼請參考本段最後。

python quickstart.py

MS 語音API - 語音輸入轉文字
import azure.cognitiveservices.speech as speechsdk

# Creates an instance of a speech config with specified subscription key and service region.
# Replace with your own subscription key and service region (e.g., "westus").
speech_key, service_region = "39aca37122c049dfae2420933131f684", "westus"
speech_config = speechsdk.SpeechConfig(subscription=speech_key, region=service_region)

# Creates a recognizer with the given settings
speech_recognizer = speechsdk.SpeechRecognizer(speech_config=speech_config)

print("Say something...")


# Starts speech recognition, and returns after a single utterance is recognized. The end of a
# single utterance is determined by listening for silence at the end or until a maximum of 15
# seconds of audio is processed. The task returns the recognition text as result.
# Note: Since recognize_once() returns only a single utterance, it is suitable only for single
# shot recognition like command or query.
# For long-running multi-utterance recognition, use start_continuous_recognition() instead.
result = speech_recognizer.recognize_once()

if ('hello' in result):
print("hello")

# Checks result.
if result.reason == speechsdk.ResultReason.RecognizedSpeech:
print("Recognized: {}".format(result.text))
elif result.reason == speechsdk.ResultReason.NoMatch:
print("No speech could be recognized: {}".format(result.no_match_details))
elif result.reason == speechsdk.ResultReason.Canceled:
cancellation_details = result.cancellation_details
print("Speech Recognition canceled: {}".format(cancellation_details.reason))
if cancellation_details.reason == speechsdk.CancellationReason.Error:
print("Error details: {}".format(cancellation_details.error_details))

 

範例2:文字轉語音檔

第二個範例是相反的流程。系統會把您在 console 輸入的文字(英文),發送到認知服務 server,再存成一個 .wav 檔。後續透過 pyaudio 這類的套件來撥放檔案即可。

python TTSSample.py

MS 語音API - 文字轉語音輸出
'''
After you've set your subscription key, run this application from your working
directory with this command: python TTSSample.py
'''
import os, requests, time
from xml.etree import ElementTree
from playsound import playsound

# This code is required for Python 2.7
try: input = raw_input
except NameError: pass

'''
If you prefer, you can hardcode your subscription key as a string and remove
the provided conditional statement. However, we do recommend using environment
variables to secure your subscription keys. The environment variable is
set to SPEECH_SERVICE_KEY in our sample.

For example:
subscription_key = "Your-Key-Goes-Here"
'''

if 'SPEECH_SERVICE_KEY' in os.environ:
subscription_key = os.environ['SPEECH_SERVICE_KEY']
else:
print('Environment variable for your subscription key is not set.')
exit()

class TextToSpeech(object):
def __init__(self, subscription_key):
self.subscription_key = subscription_key
self.tts = input("What would you like to convert to speech: ")
self.timestr = time.strftime("%Y%m%d-%H%M")
self.access_token = None

'''
The TTS endpoint requires an access token. This method exchanges your
subscription key for an access token that is valid for ten minutes.
'''
def get_token(self):
fetch_token_url = "https://westus.api.cognitive.microsoft.com/sts/v1.0/issueToken"
headers = {
'Ocp-Apim-Subscription-Key': self.subscription_key
}
response = requests.post(fetch_token_url, headers=headers)
self.access_token = str(response.text)

def save_audio(self):
base_url = 'https://westus.tts.speech.microsoft.com/'
path = 'cognitiveservices/v1'
constructed_url = base_url + path
headers = {
'Authorization': 'Bearer ' + self.access_token,
'Content-Type': 'application/ssml+xml',
'X-Microsoft-OutputFormat': 'riff-24khz-16bit-mono-pcm',
'User-Agent': 'YOUR_RESOURCE_NAME'
}
xml_body = ElementTree.Element('speak', version='1.0')
xml_body.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-us')
voice = ElementTree.SubElement(xml_body, 'voice')
voice.set('{http://www.w3.org/XML/1998/namespace}lang', 'en-US')
voice.set('name', 'en-US-Guy24kRUS') # Short name for 'Microsoft Server Speech Text to Speech Voice (en-US, Guy24KRUS)'
voice.text = self.tts
body = ElementTree.tostring(xml_body)

response = requests.post(constructed_url, headers=headers, data=body)
'''
If a success response is returned, then the binary audio is written
to file in your working directory. It is prefaced by sample and
includes the date.
'''
if response.status_code == 200:
with open('sample-001.wav', 'wb') as audio:
#with open('sample-' + self.timestr + '.wav', 'wb') as audio:
audio.write(response.content)
print("\nStatus code: " + str(response.status_code) + "\nYour TTS is ready for playback.\n")
else:
print("\nStatus code: " + str(response.status_code) + "\nSomething went wrong. Check your subscription key and headers.\n")

if __name__ == "__main__":
app = TextToSpeech(subscription_key)
app.get_token()
app.save_audio()
playsound('sample-001.wav')

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *