用 Powershell 使用 Google Cloud Text to Speech API 服務 ~ 隨手記錄

2020年12月1日星期二

用 Powershell 使用 Google Cloud Text to Speech API 服務

晚上7:38 gcp, PowerShell, text2speech No comments

程式主要有兩個部分

1. 驗證(擇一)

l 使用 Service Account 方法

n 在GCP Console設定服務帳號之後，產生金鑰檔，並下載金鑰檔

l 使用 API Key 方法

n 設定可使用 API Key 的 IP清單，以及可使用的 API服務項目

2. 傳送文字設定資料，並取得音效資料進行處理

Google Cloud Text to Speech API 服務是要收費的，所以需要經過驗證

使用 Service Account

# 載入 Google Cloud SDK 的 GoogleCloud Module for PowerShell
Import-Module GoogleCloud
# 設定 GAC 環境變數，指定服務金鑰檔案路徑
$env:GOOGLE_APPLICATION_CREDENTIALS = "D:\完整金鑰路徑檔名.json"
# 讀取金鑰後產生隨機的認證碼
$gauth = gcloud auth application-default print-access-token
# 將認證碼加到 header hashtable
$headers = @{"Authorization", "Bearer $gauth"}
$url = "https://texttospeech.googleapis.com/v1/text:synthesize"
# ContentType 指定說明傳輸的資料格式是 JSON，編碼是 utf-8
$response = Invoke-RestMethod -ContentType 'application/json;charset=utf-8' -headers $headers -Uri $url -Method Post -body $JSON

使用 API Key

# 就是在 API 網址後面增加 Key參數

$url = "https://texttospeech.googleapis.com/v1/text:synthesize?key=你的金鑰字串"

# ContentType 指定說明傳輸的資料格式是 JSON，編碼是 utf-8

$response = Invoke-RestMethod -ContentType 'application/json;charset=utf-8' -headers $headers -Uri $url -Method Post -body $JSON

使用 API Key 程式會簡單很多，而且不用安裝 Google Cloud Library or Cloud SDK等
但是要記得設定允許使用此金鑰的 IP及服務，以確保資訊安全

接下來就是將文字及設定資訊傳送給Google Cloud Text to Speech API

格式一定要照規矩來，不然API會回覆錯誤

$content = "你想要讓 Google Text to Speech 唸出來的文字"

$AudioEncoding = "MP3"

$LanguageCode = "cmn-TW"

$VoiceName = "$LanguageCode-Wavenet-A"

$Effect="large-home-entertainment-class-device"

$JSON = @{

audioConfig = @{

audioEncoding = $AudioEncoding;

effectsProfileId = $Effect;

pitch = 0;

speakingRate = 0.9; # speakingRate = 0.25 ~ 4.0

volume_gain_db = +6; # volume_gain_db = -96.0 ~ 16.0

};

input = @{

};

voice = @{

languageCode = $LanguageCode;

name = $VoiceName;

}

# 如果 Content 不是使用 SSML XML 格式， input 裡的 key 就必須使用 text

# 如果 Content 使用 SSML XML 格式， input 裡的 key 就必須使用 ssml

# 所以用程式來判斷 key 是要用 text or ssml

If($content -match [regex]"^\<speak\>" -AND $content -match [regex]"\<\/speak\>$"){

$body.input.Add("ssml",$content)

}Else{

$body.input.Add("text", $content)

}

$JSON = ConvertTo-Json ($JSON)

Try{

# Content Type 除了要告知是 JSON 資料之外，還要加上 charset=utf-8，不然只會唸中文字串中的數字

$response = Invoke-RestMethod -ContentType 'application/json;charset=utf-8' -headers $headers -Uri $target -Method Post -body $JSON

# 回傳的 Audio Data 是 Base64 編碼的

$base64Audio = $response.audioContent

# 讀取系統環境變數，取得桌面路徑

$Dest = [System.Environment]::GetFolderPath([System.Environment+SpecialFolder]::Desktop)

# 在桌面產生 PlayTTS.html 網頁檔案(語音資料以 Data URI 方式內嵌在網頁裡)

"<html><head><title>Providing HTML5 audio with a base64 encoded Data URI as source</title><meta charset='utf-8'></head><body><h1>Providing HTML5 audio with a base64 encoded Data URI as source</h1><audio controls='' src='data:audio/ogg;base64,{0}'></audio></body></html>" -f $base64Audio | Out-File -FilePath "$Dest\PlayTTS.html" -Encoding oem

# 使用系統預設關聯，呼叫預設瀏覽器開啟網頁檔案，不需要任何外掛JS，即可播放

& "$Dest\PlayTTS.html"

# 將 Base64 Audio Data 存成文字檔

$base64Audio | Out-File -FilePath "./google.txt" -Encoding ascii -Force

# 設定音效檔檔名

$convertedFileName = 'PlayTTS-{0}.mp3' -f (get-date -f yyyy-MM-dd-hh-mm-ss)

# 呼叫 Windows 內建 certutil.exe 將 Base64 編碼檔案解碼，檔案輸出到桌面

certutil -decode google.txt "$Dest\$convertedFileName"

# 開啟檔案總管，並選取輸出的音效檔

Start-Process -FilePath "$($env:WinDir)\explorer.exe" -ArgumentList "/select, $Dest\$convertedFileName"

}Catch {

# 如果 Request 發生錯誤，就印出錯誤碼

Write-Host "StatusCode:" $_.Exception.Response.StatusCode.value__

Write-Host "StatusDescription:" $_.Exception.Response.StatusDescription

}

audioConfig 參考資料

SSML 參考資料

0 意見:

張貼留言

隨手記錄

捐血一袋救人一命

2020年12月1日星期二