Improved TTS Integration documentation (#39207)

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-07-18 23:06:58 +00:00 · 2025-05-27 19:20:35 +02:00 · 2025-05-27 19:20:35 +02:00 · b9999c033a
commit b9999c033a
parent 77def4ba8b
1 changed files with 37 additions and 4 deletions
--- a/source/_integrations/tts.markdown
+++ b/source/_integrations/tts.markdown
@ -46,7 +46,7 @@ Screenshot showing the state of a text-to-speech entity in the developer tools.

 Modern platforms will create entities under the `tts` domain, where each entity represents one text-to-speech service provider. These entities may be used as targets for the `tts.speak` action.

-the `tts.speak` action supports `language` and on some platforms also `options` for settings, e.g., _voice, motion, speed, etc_. The text that should be spoken is set with `message`, and the media player that should output the sound is selected with `media_player_entity_id`.
+The `tts.speak` action supports `message`, `language`, `cache`, `media_player_entity_id` and `options` options. The text that should be spoken is set with `message`, and the media player that should output the sound is selected with `media_player_entity_id`. The language can be set with `language`, using the format required by the target entity platform (refer to specific platform documentation). See [cache section](#cache) for information on `cache` option. Additional settings can be specified with the `options` option, which include preferred audio settings (see [preferred audio settings](#preferred-audio-settings) section for more info) and further settings of the target entity platform, e.g., _voice, motion, speed, etc._ (refer to specific platform documentation for any supported settings).

 ```yaml
 action: tts.speak
@ -59,7 +59,7 @@ data:

 ### Action say (legacy)

-The `say` action supports `language` and on some platforms also `options` for settings, e.g., _voice, motion, speed, etc_. The text that should be spoken is set with `message`. Since release 0.92, action name can be defined in configuration `service_name` option.
+The `say` action supports `message`, `language`, `cache` and `options` options. The text that should be spoken is set with `message`. The language can be set with `language`, using the format required by the platform (refer to specific platform documentation). See [cache section](#cache) for information on `cache` option. Additional settings can be specified with the `options` option, which include preferred audio settings (see [preferred audio settings](#preferred-audio-settings) section for more info) and further settings of the target platform, e.g., _voice, motion, speed, etc._ (refer to specific platform documentation for any supported settings). Since release 0.92, action name can be defined in configuration `service_name` option.

 Say to all `media_player` entities:

@ -105,13 +105,40 @@ data:

 ## Cache

-The integration cache can be controlled with the `cache` option in the action to `speak` or `say`. A long time cache will be located on the file system. The in-memory cache for fast responses to media players will be auto-cleaned after a short period.
+The integration cache can be controlled with the `cache` option in the action to `speak` or `say`, setting it to `True` to enable it (default), or `False` to disable it. A long time cache will be located on the file system. The in-memory cache for fast responses to media players will be auto-cleaned after a short period.
+
+## Preferred audio settings
+
+Each TTS platform produces audio samples in different formats, not always compatible with every media player. TTS integration building block supports a way to configure preferred target audio format through `options` option of `speak` or `say` actions.
+
+TTS integration building block uses [FFmpeg integration](/integrations/ffmpeg) to perform audio transcoding when target entity platform does not support one or all the specified preferred audio format settings (refer to specific platform documentation for any supported setting with related supported values).
+
+Available preferred audio settings, all optional, are:
+
+- `preferred_format`: Set the audio format. When not supported by the target entity platform, the value is a file extension like `wav`, `mp3`, `ogg`, etc., among ones supported by FFmpeg tool for output files.
+- `preferred_sample_rate`: Set the sample rate. When not supported by the target entity platform, the value is in Hz as a number, among ones supported by the `-ar` parameter of FFmpeg tool.
+- `preferred_sample_channels`: Set the number of audio channels. When not supported by the target entity platform, the value is a number among ones supported by the `-ac` parameter of FFmpeg tool.
+- `preferred_sample_bytes`: Set the audio bit sampling. When not supported by the target entity platform, can only be set to `2` to use 16-bit audio sampling (any other value is ignored).
+
+Example to produce an MP3 audio at 22050Hz:
+
+```yaml
+action: tts.speak
+target:
+  entity_id: tts.example
+data:
+  media_player_entity_id: media_player.kitchen
+  message: "May the force be with you."
+  options:
+    preferred_format: mp3
+    preferred_sample_rate: 22050
+```

 ## REST API

 ### POST `/api/tts_get_url`

-Returns a URL to the generated TTS file. The `engine_id` or `platform` parameter together with `message` are required.
+Returns a URL to the generated TTS file. The `engine_id` (which is the entity id) or `platform` parameter together with `message` are required. Additional parameters `cache`, `language` and `options` are supported, as JSON attributes, as described for `speak` action.

 ```json
 {
@ -166,3 +193,9 @@ These requirements present the following problems, all of which create problems
 - If you are using SSL (e.g., `https://yourhost.example.org/...`) then you _must_ use the hostname in the certificate (e.g., `external_url: https://yourhost.example.org`). You cannot use an IP address since the certificate won't be valid for the IP address, and the cast device will refuse the connection.

 The recommended way to overcome these obstacles is to not manually configure a local Home Assistant URL.
+
+### Partial, corrupted or no audio
+
+Some media players could reproduce only partial, corrupted or no audio at all when the audio format is not fully supported. In such cases it is required to experiment with different combinations of audio formats, channels, sample rates and bits using [preferred audio settings](#preferred-audio-settings) options.
+
+For example, some Google Cast devices skip initial audio part when the audio is sampled at 22050Hz, and to fix the problem it is required to set the `preferred_sample_rate` setting in the `options` option to `44100`.