diff --git a/docs/voice/pipelines/index.md b/docs/voice/pipelines/index.md index 38651bd0..0936d12b 100644 --- a/docs/voice/pipelines/index.md +++ b/docs/voice/pipelines/index.md @@ -1,8 +1,8 @@ --- -title: "Pipelines" +title: "Assist Pipelines" --- -A pipeline runs the common steps of a [voice assistant](https://next.home-assistant.io/integrations/voice_assistant): +The [Assist pipeline](https://next.home-assistant.io/integrations/assit_pipeline) integration runs the common steps of a voice assistant: 1. Speech to text 2. Intent recognition @@ -12,8 +12,13 @@ Pipelines are run via a WebSocket API: ```json { - "type": "voice_assistant/run", - "language": "en-US" + "type": "assist_assistant/run", + "language": "en-US", + "start_stage": "stt", + "end_stage": "tts", + "input": { + "sample_rate": 16000, + } } ``` @@ -21,9 +26,9 @@ The following input fields are available: | Name | Type | Description | |-------------------|--------|---------------------------------------------------------------------------------------------| -| `intent_input` | string | Required. Input text to process. | -| `language` | string | Optional. Language of pipeline to run (default: configured language in HA). | -| `pipeline` | string | Optional. Id of a pipeline to run (default: use first one that matches specified language). | +| `start_stage` | enum | Required. The first stage to run. One of `stt`, `intent`, `tts`. | +| `end_stage` | enum | Required. The last stage to run. One of `stt`, `intent`, `tts`. | +| `input` | dict | Depends on `start_stage`. For STT, the dictionary should contain a key `sample_rate` with an integer value. For intent and TTS, the key `text` should contain the input text. | `conversation_id` | string | Optional. [Unique id for conversation](/docs/intent_conversation_api#conversation-id). | | `timeout` | number | Optional. Number of seconds before pipeline times out (default: 30). | @@ -34,10 +39,13 @@ The following events can be emitted: | Name | Description | Emitted | Attributes | |-----------------|-----------------------------|------------|---------------------------------------------------------------------------------------------------------| -| `run-start` | Start of pipeline run | always | `pipeline` - Id of pipeline
`language` - Language used for pipeline
| +| `run-start` | Start of pipeline run | always | `pipeline` - Name of pipeline
`pipeline_id` - ID of the pipeline
`language` - Language used for pipeline
`runner_data` - Extra WebSocket data:
  • `stt_binary_handler_id` is the prefix to send speech data over.
  • `timeout` is the max run time for the whole pipeline.
  • | | `run-finish` | End of pipeline run | always | | -| `intent-start` | Start of intent recognition | always | `engine` - [Agent](/docs/intent_conversation_api) engine used
    `intent_input` - Input text to agent | +| `stt-start` | Start of speech to text | audio only | `engine`: STT engine used
    `metadata`: incoming audio metadata +| `stt-finish` | End of speech to text | audio only | `stt_output` - Object with `text`, the detected text. +| `intent-start` | Start of intent recognition | always | `engine` - [Agent](/docs/intent_conversation_api) engine used
    `language`: Processing language.
    `intent_input` - Input text to agent | | `intent-finish` | End of intent recognition | always | `intent_output` - [conversation response](/docs/intent_conversation_api#conversation-response) | -| `tts-start` | Start of text to speech | audio only | `tts_input` - text to speak | -| `tts-finish` | End of text to speech | audio only | `tts_otuput` - URL of spoken audio | +| `tts-start` | Start of text to speech | audio only | `engine` - TTS engine used
    `language`: Output language.
    `voice`: Output voice.
    `tts_input`: Text to speak. | +| `tts-finish` | End of text to speech | audio only | `media_id` - Media Source ID of the generated audio
    `url` - URL to the generated audio
    `mime_type` - MIME type of the generated audio
    | +| `error` | Error in pipeline | On error | `code` - Error code
    `message` - Error message |