In the following guide, I will show how to deploy Text-To-Speech via. webhooks to a Google Nest device, Chromecast device, or any supported HA media player device, using Home Assistant. This will enable customized Text-To-Speech from any application capable of performing HTTP requests.
Prerequisites:
- Google Nest, Chromecast, or other supported HA media player device.
- Device running Home Assistant. (Refer to HA’s installation page)
- Device for configuring HA, and issuing HTTP requests.
This guide assumes a working fresh install of Home Assistant.
Set up access to the config of Home Assistant
Configuration of Home Assistant is done using .yaml
files. The main file is referred to as configuration.yaml
. From this file, several .yaml
subfiles are included.
To modify the .yaml
files, we will make use of the recommended method from the Editing configuration.yaml page from the Home Assistant docs:
Install the Visual Studio Code Server addon in Home Assistant, and start it and launch the web interface once installation has completed. This will make a version of Visual Studio Code available with access to the Home Assistant configuration files, running in the browser.
We will need this later.
Set up an automation for the TTS action
Start by navigating to Settings
-> Automations & Scenes
and press the + Create Automation
in the lower right-hand corner.
Next, select Create new automation
:
Next, under the When section add a trigger and select Other triggers
:
Scroll to the bottom, and select Webhook
:
Now the webhook that will trigger the TTS action can be configured.
The Webhook ID
will be used as part of the URL to access the webhook, using the following format: http://[IP-Adress of HA]:[Port]/api/webhook/[Webhook ID]
.
You can name this anything you want. HA creates a random ID by default to obfuscate the webhook endpoints, so that malicious actors cannot abuse the webhooks. Refer to HA docs for more security info.
Using the settings on the right side (gear icon), it’s possible to define the HTTP methods to be made available for the webhook. Additionally, it’s possible to limit usage of the webhook to devices from within the local network.
Now to set up the Then do section of the automation:
Press the + Add Action
button, and search for Text-to-speech (TTS): Speak
and select it. Alternatively, locate the entry under Other actions
-> Text-to-speech (TTS)
-> Speak
.
Next, press + Choose entity
and select Google en com
/tts.google.en.com
.
Select the media player that should play the generated Text-to-speech audio. In this case the Google Nest device.
The message parameter is not important for now, as we will edit the configuration to use an incoming message from the webhook shortly.
Choose if you want to cache the same messages. (This will take up space on the HA device).
You can specify a language, using the ISO-639 code, according to the supported languages.
Editing the automation definition
Navigate to Settings
-> Add-ons
-> Studio Code Server
-> Open Web UI
.
Next, in the left panel in VS Code, open the automations.yaml
file.
Your newly created automation should look like the following example:
- id: '1704866261238'
alias: My TTS Automation
description: ''
trigger:
- platform: webhook
allowed_methods:
- POST
local_only: true
webhook_id: tts
condition: []
action:
- service: tts.speak
metadata: {}
data:
cache: true
media_player_entity_id: media_player.nest
message: Placeholder Text
language: en
target:
entity_id: tts.google_en_com
mode: single
Next, move the lines message
and language
to a new label titled data_template
, and add an entity_id
field, with the same value as the media_player_entity_id
.
Change the message
value to "{{ trigger.json.message }}"
and the language
value to "{{ trigger.json.lang }}"
.
Example of a finished automation configuration:
- id: "1704866261238"
alias: My TTS Automation
description: ""
trigger:
- platform: webhook
allowed_methods:
- POST
local_only: true
webhook_id: tts
action:
- service: tts.speak
metadata: {}
data:
cache: true
media_player_entity_id: media_player.nest
data_template:
entity_id: media_player.nest
message: "{{ trigger.json.message }}"
language: "{{ trigger.json.lang }}"
target:
entity_id: tts.google_en_com
mode: single
When you are done editing, save the file in VS Code.
Then, navigate to Developer Tools
-> YAML
, and click AUTOMATIONS
in order to reload the automations.
Querying the webhook to trigger the automation
Now you can query the webhook at http://[IP-Adress of HA]:[Port]/api/webhook/[Webhook ID]
.
The webhook takes a JSON object in the body with the following format:
{
"message": "Hello World! I can speak!",
"lang": "en"
}
Substitute en
for any of the languages supported by Google Cloud TTS.
Happy TTS’ing!
/ eldahl