From 0d008076b3e421a35e5644188e8c1af2b1d1559d Mon Sep 17 00:00:00 2001
From: Paulus Schoutsen <balloob@gmail.com>
Date: Tue, 22 Apr 2025 15:37:35 -0400
Subject: [PATCH] Document missing voice features (#2649)

Co-authored-by: Michael Hansen <hansen.mike@gmail.com>
---
 docs/core/entity/conversation.md | 6 ++++++
 docs/intent_conversation_api.md  | 5 ++++-
 docs/voice/pipelines/index.md    | 5 +++--
 3 files changed, 13 insertions(+), 3 deletions(-)
diff --git a/docs/core/entity/conversation.md b/docs/core/entity/conversation.md
index db1e7bcc..abc3336b 100644
--- a/docs/core/entity/conversation.md
+++ b/docs/core/entity/conversation.md
@@ -72,6 +72,12 @@ A `ConversationInput` object contains the following data:
 
 _We used to promote `async_process` as the method to process messages. This was changed to `_async_handle_message` to automatically include the chat log. The change is backwards compatible._
 
+#### Chat log
+
+The chat log object allows the conversation entity to read the conversation history and to add messages and tool calls to it.
+
+See [the Python interface](https://github.com/home-assistant/core/blob/dev/homeassistant/components/conversation/chat_log.py) for the full typed API.
+
 ### Prepare
 
 As soon as Home Assistant knows a request is coming in, we will let the conversation entity prepare for it. This can be used to load a language model or other resources. This function is optional to implement.
diff --git a/docs/intent_conversation_api.md b/docs/intent_conversation_api.md
index b1e0bf9f..e2f72790 100644
--- a/docs/intent_conversation_api.md
+++ b/docs/intent_conversation_api.md
@@ -44,6 +44,7 @@ The JSON response from `/api/conversation/process` contains information about th
 
 ```json
 {
+  "continue_conversation": true,
   "response": {
     "response_type": "action_done",
     "language": "en",
@@ -91,6 +92,8 @@ The following properties are available in the `"response"` object:
 
 The [conversation id](#conversation-id) is returned alongside the conversation response.
 
+If `continue_conversation` is set to true, the conversation agent expects a follow-up from the user.
+
 
 ## Response types
 
@@ -287,7 +290,7 @@ POST with the next input sentence:
 ```
 
 
-## Pre-loading sentences 
+## Pre-loading sentences
 
 Sentences for a language can be pre-loaded using the WebSocket API:
 
diff --git a/docs/voice/pipelines/index.md b/docs/voice/pipelines/index.md
index df196e85..c87c92cf 100644
--- a/docs/voice/pipelines/index.md
+++ b/docs/voice/pipelines/index.md
@@ -41,7 +41,7 @@ The following events can be emitted:
 
 | Name           | Description                  | Emitted    | Attributes                                                                                                                                                                                                                                                              |
 |----------------|------------------------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
-| `run-start`    | Start of pipeline run        | always     | `pipeline` - ID of the pipeline<br />`language` - Language used for pipeline<br />`runner_data` - Extra WebSocket data: <ul><li>`stt_binary_handler_id` is the prefix to send speech data over.</li><li>`timeout` is the max run time for the whole pipeline.</li></ul> |
+| `run-start`    | Start of pipeline run        | always     | `pipeline` - ID of the pipeline<br />`language` - Language used for pipeline<br />`runner_data` - Extra WebSocket data: <ul><li>`stt_binary_handler_id` is the prefix to send speech data over.</li><li>`timeout` is the max run time for the whole pipeline.</li></ul><br />`tts_output` - TTS Output data<ul><li>`token` - Token of the generated audio</li><li>`url` - URL to the generated audio</li><li>`mime_type` - MIME type of the generated audio</li></ul> |
 | `run-end`      | End of pipeline run          | always     |                                                                                                                                                                                                                                                                         |
 | `wake_word-start`   | Start of wake word detection | audio only | `engine`: wake engine used<br />`metadata`: incoming audio<br />`timeout`: seconds before wake word timeout metadata                                                                                                                                                                                                     |
 | `wake_word-end`     | End of wake word detection   | audio only | `wake_word_output` - Detection result data: <ul><li>`wake_word_id` is the id of detected wake word</li><li>`timestamp` is the detection time relative to start of audio stream (milliseconds, optional)</li></ul>                                                                             |
@@ -50,9 +50,10 @@ The following events can be emitted:
 | `stt-vad-end`    | End of voice command      | audio only | `timestamp`: time relative to start of audio stream (milliseconds)
 | `stt-end`      | End of speech to text        | audio only | `stt_output` - Object with `text`, the detected text.                                                                                                                                                                                                                   |
 | `intent-start` | Start of intent recognition  | always     | `engine` - [Agent](/docs/intent_conversation_api) engine used<br />`language`: Processing language. <br /> `intent_input` - Input text to agent                                                                                                                         |
+| `intent-progress`   | Intermediate update of intent recognition    | depending on conversation agent     | `chat_log_delta` - delta object from the [chat log](/docs/core/entity/conversation#chat-log)                                                                                                                                                                          |
 | `intent-end`   | End of intent recognition    | always     | `intent_output` - [conversation response](/docs/intent_conversation_api#conversation-response)                                                                                                                                                                          |
 | `tts-start`    | Start of text to speech      | audio only | `engine` - TTS engine used<br />`language`: Output language.<br />`voice`: Output voice. <br />`tts_input`: Text to speak.                                                                                                                                              |
-| `tts-end`      | End of text to speech        | audio only | `media_id` - Media Source ID of the generated audio<br />`url` - URL to the generated audio<br />`mime_type` - MIME type of the generated audio<br />                                                                                                                   |
+| `tts-end`      | End of text to speech        | audio only | `token` - Token of the generated audio<br />`url` - URL to the generated audio<br />`mime_type` - MIME type of the generated audio<br />                                                                                                                   |
 | `error`        | Error in pipeline            | on error   | `code` - Error code ([see below](#error-codes))<br />`message` - Error message                                                                                                                                                                                                                      |
 
 ## Error codes