From 0d008076b3e421a35e5644188e8c1af2b1d1559d Mon Sep 17 00:00:00 2001 From: Paulus Schoutsen Date: Tue, 22 Apr 2025 15:37:35 -0400 Subject: [PATCH] Document missing voice features (#2649) Co-authored-by: Michael Hansen --- docs/core/entity/conversation.md | 6 ++++++ docs/intent_conversation_api.md | 5 ++++- docs/voice/pipelines/index.md | 5 +++-- 3 files changed, 13 insertions(+), 3 deletions(-) diff --git a/docs/core/entity/conversation.md b/docs/core/entity/conversation.md index db1e7bcc..abc3336b 100644 --- a/docs/core/entity/conversation.md +++ b/docs/core/entity/conversation.md @@ -72,6 +72,12 @@ A `ConversationInput` object contains the following data: _We used to promote `async_process` as the method to process messages. This was changed to `_async_handle_message` to automatically include the chat log. The change is backwards compatible._ +#### Chat log + +The chat log object allows the conversation entity to read the conversation history and to add messages and tool calls to it. + +See [the Python interface](https://github.com/home-assistant/core/blob/dev/homeassistant/components/conversation/chat_log.py) for the full typed API. + ### Prepare As soon as Home Assistant knows a request is coming in, we will let the conversation entity prepare for it. This can be used to load a language model or other resources. This function is optional to implement. diff --git a/docs/intent_conversation_api.md b/docs/intent_conversation_api.md index b1e0bf9f..e2f72790 100644 --- a/docs/intent_conversation_api.md +++ b/docs/intent_conversation_api.md @@ -44,6 +44,7 @@ The JSON response from `/api/conversation/process` contains information about th ```json { + "continue_conversation": true, "response": { "response_type": "action_done", "language": "en", @@ -91,6 +92,8 @@ The following properties are available in the `"response"` object: The [conversation id](#conversation-id) is returned alongside the conversation response. +If `continue_conversation` is set to true, the conversation agent expects a follow-up from the user. + ## Response types @@ -287,7 +290,7 @@ POST with the next input sentence: ``` -## Pre-loading sentences +## Pre-loading sentences Sentences for a language can be pre-loaded using the WebSocket API: diff --git a/docs/voice/pipelines/index.md b/docs/voice/pipelines/index.md index df196e85..c87c92cf 100644 --- a/docs/voice/pipelines/index.md +++ b/docs/voice/pipelines/index.md @@ -41,7 +41,7 @@ The following events can be emitted: | Name | Description | Emitted | Attributes | |----------------|------------------------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `run-start` | Start of pipeline run | always | `pipeline` - ID of the pipeline
`language` - Language used for pipeline
`runner_data` - Extra WebSocket data: | +| `run-start` | Start of pipeline run | always | `pipeline` - ID of the pipeline
`language` - Language used for pipeline
`runner_data` - Extra WebSocket data:
`tts_output` - TTS Output data | | `run-end` | End of pipeline run | always | | | `wake_word-start` | Start of wake word detection | audio only | `engine`: wake engine used
`metadata`: incoming audio
`timeout`: seconds before wake word timeout metadata | | `wake_word-end` | End of wake word detection | audio only | `wake_word_output` - Detection result data: | @@ -50,9 +50,10 @@ The following events can be emitted: | `stt-vad-end` | End of voice command | audio only | `timestamp`: time relative to start of audio stream (milliseconds) | `stt-end` | End of speech to text | audio only | `stt_output` - Object with `text`, the detected text. | | `intent-start` | Start of intent recognition | always | `engine` - [Agent](/docs/intent_conversation_api) engine used
`language`: Processing language.
`intent_input` - Input text to agent | +| `intent-progress` | Intermediate update of intent recognition | depending on conversation agent | `chat_log_delta` - delta object from the [chat log](/docs/core/entity/conversation#chat-log) | | `intent-end` | End of intent recognition | always | `intent_output` - [conversation response](/docs/intent_conversation_api#conversation-response) | | `tts-start` | Start of text to speech | audio only | `engine` - TTS engine used
`language`: Output language.
`voice`: Output voice.
`tts_input`: Text to speak. | -| `tts-end` | End of text to speech | audio only | `media_id` - Media Source ID of the generated audio
`url` - URL to the generated audio
`mime_type` - MIME type of the generated audio
| +| `tts-end` | End of text to speech | audio only | `token` - Token of the generated audio
`url` - URL to the generated audio
`mime_type` - MIME type of the generated audio
| | `error` | Error in pipeline | on error | `code` - Error code ([see below](#error-codes))
`message` - Error message | ## Error codes