Add stt-vad-start and stt-vad-end pipeline events

This commit is contained in:
Michael Hansen 2023-08-17 16:35:15 -05:00
parent 68b6b21b2b
commit 6941f6e1d7

View File

@ -37,17 +37,19 @@ The following input fields are available:
As the pipeline runs, it emits events back over the WebSocket connection.
The following events can be emitted:
| Name | Description | Emitted | Attributes |
|-----------------|-----------------------------|------------|---------------------------------------------------------------------------------------------------------|
| `run-start` | Start of pipeline run | always | `pipeline` - ID of the pipeline<br />`language` - Language used for pipeline<br />`runner_data` - Extra WebSocket data: <ul><li>`stt_binary_handler_id` is the prefix to send speech data over.</li><li>`timeout` is the max run time for the whole pipeline.</li></ul> |
| `run-end` | End of pipeline run | always | |
| `stt-start` | Start of speech to text | audio only | `engine`: STT engine used<br />`metadata`: incoming audio metadata
| `stt-end` | End of speech to text | audio only | `stt_output` - Object with `text`, the detected text.
| `intent-start` | Start of intent recognition | always | `engine` - [Agent](/docs/intent_conversation_api) engine used<br />`language`: Processing language. <br /> `intent_input` - Input text to agent |
| `intent-end` | End of intent recognition | always | `intent_output` - [conversation response](/docs/intent_conversation_api#conversation-response) |
| `tts-start` | Start of text to speech | audio only | `engine` - TTS engine used<br />`language`: Output language.<br />`voice`: Output voice. <br />`tts_input`: Text to speak. |
| `tts-end` | End of text to speech | audio only | `media_id` - Media Source ID of the generated audio<br />`url` - URL to the generated audio<br />`mime_type` - MIME type of the generated audio<br /> |
| `error` | Error in pipeline | On error | `code` - Error code<br />`message` - Error message |
| Name | Description | Emitted | Attributes |
|-----------------|-----------------------------|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `run-start` | Start of pipeline run | always | `pipeline` - ID of the pipeline<br />`language` - Language used for pipeline<br />`runner_data` - Extra WebSocket data: <ul><li>`stt_binary_handler_id` is the prefix to send speech data over.</li><li>`timeout` is the max run time for the whole pipeline.</li></ul> |
| `run-end` | End of pipeline run | always | |
| `stt-start` | Start of speech to text | audio only | `engine`: STT engine used<br />`metadata`: incoming audio metadata |
| `stt-vad-start` | Start of voice command | audio only | `timestamp`: milliseconds after the start of the audio stream |
| `stt-vad-end` | End of voice command | audio only | `timestamp`: milliseconds after the start of the audio stream |
| `stt-end` | End of speech to text | audio only | `stt_output` - Object with `text`, the detected text. |
| `intent-start` | Start of intent recognition | always | `engine` - [Agent](/docs/intent_conversation_api) engine used<br />`language`: Processing language. <br /> `intent_input` - Input text to agent |
| `intent-end` | End of intent recognition | always | `intent_output` - [conversation response](/docs/intent_conversation_api#conversation-response) |
| `tts-start` | Start of text to speech | audio only | `engine` - TTS engine used<br />`language`: Output language.<br />`voice`: Output voice. <br />`tts_input`: Text to speak. |
| `tts-end` | End of text to speech | audio only | `media_id` - Media Source ID of the generated audio<br />`url` - URL to the generated audio<br />`mime_type` - MIME type of the generated audio<br /> |
| `error` | Error in pipeline | On error | `code` - Error code<br />`message` - Error message |
## Sending speech data