In the following guide, I will show how to deploy Text-To-Speech via. webhooks to a Google Nest device, Chromecast device, or any supported HA media player device, using Home Assistant. This will enable customized Text-To-Speech from any application capable of performing HTTP requests.

Prerequisites:

  • Google Nest, Chromecast, or other supported HA media player device.
  • Device running Home Assistant. (Refer to HA’s installation page)
  • Device for configuring HA, and issuing HTTP requests.

This guide assumes a working fresh install of Home Assistant.

Set up access to the config of Home Assistant

Configuration of Home Assistant is done using .yaml files. The main file is referred to as configuration.yaml. From this file, several .yaml subfiles are included.

To modify the .yaml files, we will make use of the recommended method from the Editing configuration.yaml page from the Home Assistant docs:

Install the Visual Studio Code Server addon in Home Assistant, and start it and launch the web interface once installation has completed. This will make a version of Visual Studio Code available with access to the Home Assistant configuration files, running in the browser.

We will need this later.

Set up an automation for the TTS action

Start by navigating to Settings -> Automations & Scenes and press the + Create Automation in the lower right-hand corner.

Next, select Create new automation: Image

Next, under the When section add a trigger and select Other triggers: Image

Scroll to the bottom, and select Webhook: Image

Now the webhook that will trigger the TTS action can be configured. The Webhook ID will be used as part of the URL to access the webhook, using the following format: http://[IP-Adress of HA]:[Port]/api/webhook/[Webhook ID].

You can name this anything you want. HA creates a random ID by default to obfuscate the webhook endpoints, so that malicious actors cannot abuse the webhooks. Refer to HA docs for more security info.

Using the settings on the right side (gear icon), it’s possible to define the HTTP methods to be made available for the webhook. Additionally, it’s possible to limit usage of the webhook to devices from within the local network. Image

Now to set up the Then do section of the automation:

Press the + Add Action button, and search for Text-to-speech (TTS): Speak and select it. Alternatively, locate the entry under Other actions -> Text-to-speech (TTS) -> Speak. Image

Next, press + Choose entity and select Google en com/tts.google.en.com. Image

Select the media player that should play the generated Text-to-speech audio. In this case the Google Nest device. Image

The message parameter is not important for now, as we will edit the configuration to use an incoming message from the webhook shortly.

Choose if you want to cache the same messages. (This will take up space on the HA device).

You can specify a language, using the ISO-639 code, according to the supported languages.

Editing the automation definition

Navigate to Settings -> Add-ons -> Studio Code Server -> Open Web UI. Next, in the left panel in VS Code, open the automations.yaml file.

Your newly created automation should look like the following example:

- id: '1704866261238'
  alias: My TTS Automation
  description: ''
  trigger:
  - platform: webhook
    allowed_methods:
    - POST
    local_only: true
    webhook_id: tts
  condition: []
  action:
  - service: tts.speak
    metadata: {}
    data:
      cache: true
      media_player_entity_id: media_player.nest
      message: Placeholder Text
      language: en
    target:
      entity_id: tts.google_en_com
  mode: single

Next, move the lines message and language to a new label titled data_template, and add an entity_id field, with the same value as the media_player_entity_id.

Change the message value to "{{ trigger.json.message }}" and the language value to "{{ trigger.json.lang }}".

Example of a finished automation configuration:

- id: "1704866261238"
  alias: My TTS Automation
  description: ""
  trigger:
    - platform: webhook
      allowed_methods:
        - POST
      local_only: true
      webhook_id: tts
  action:
    - service: tts.speak
      metadata: {}
      data:
        cache: true
        media_player_entity_id: media_player.nest
      data_template:
        entity_id: media_player.nest
        message: "{{ trigger.json.message }}"
        language: "{{ trigger.json.lang }}"
      target:
        entity_id: tts.google_en_com
  mode: single

When you are done editing, save the file in VS Code. Then, navigate to Developer Tools -> YAML, and click AUTOMATIONS in order to reload the automations.

Querying the webhook to trigger the automation

Now you can query the webhook at http://[IP-Adress of HA]:[Port]/api/webhook/[Webhook ID].

The webhook takes a JSON object in the body with the following format:

{
	"message": "Hello World! I can speak!",
    "lang": "en"
}

Substitute en for any of the languages supported by Google Cloud TTS.

Happy TTS’ing!
/ eldahl