mirror of
https://github.com/home-assistant/home-assistant.io.git
synced 2025-11-14 13:31:52 +00:00
126 lines
7.4 KiB
Markdown
126 lines
7.4 KiB
Markdown
---
|
||
title: "Getting started - Local"
|
||
related:
|
||
- docs: /voice_control/best_practices/
|
||
title: Best practices with Assist
|
||
- docs: /voice_control/expanding_assist/
|
||
title: Expanding Assist
|
||
- docs: /voice_control/voice_remote_local_assistant/
|
||
title: Creating a local assistant
|
||
- docs: /voice_control/voice_remote_expose_devices/
|
||
title: Exposing devices to Assist
|
||
- url: https://support.nabucasa.com/hc/en-us/categories/24451727188125-Home-Assistant-Voice-Preview-Edition
|
||
title: Voice Preview Edition - Documentation
|
||
---
|
||
|
||
The simplest and most effective way to use Assist is to leverage the voice providers (for speech-to-text and text-to-speech) included in [Home Assistant Cloud](/voice_control/voice_remote_cloud_assistant/)
|
||
|
||
If you are interested in setting up a fully local voice assistant, follow this setup:
|
||
|
||
|
||
## Prerequisites
|
||
|
||
For Assist to be able to talk to your Home Assistant setup your setup needs to be able to listen, understand and then talk back.
|
||
|
||
In Home Assistant, the Assist pipelines are made up of various components that together form a voice assistant. For each component, you can choose from different options.
|
||
|
||
- For listening and talking back, it needs your phone with the Home Assistant app, or a voice activated device.
|
||
- For understanding, it needs to have a text-to-speech and speech-to-text software integrated.
|
||
- For running all together, it needs to have the Home Assistant Operating System running.
|
||
|
||
## Some options for speech-to-text and text-to-speech
|
||
|
||
There are speech-to-text and text-to-speech options that run entirely local. No data is sent to external servers for processing.
|
||
|
||
### Speech-to-text engines
|
||
|
||
There are currently two options to run speech-to-text locally: **Speech-to-Phrase** and **Whisper**.
|
||
|
||
#### Speech-to-Phrase
|
||
[Speech-to-Phrase](https://github.com/OHF-voice/speech-to-phrase) is a close-ended speech model.
|
||
|
||
- It transcribes what it knows.
|
||
- Extremely fast transcription even on a Home Assistant Green or Raspberry Pi 4 (under one second).
|
||
- Only supports a subset of Assist’s voice commands.
|
||
- More open-ended items such as shopping lists, naming a timer, and broadcasts are *not* usable out of the box.
|
||
- Speech-to-Phrase supports [various languages](https://github.com/OHF-voice/speech-to-phrase?tab=readme-ov-file#supported-languages).
|
||
- These qualities make it a great option for Home control!
|
||
|
||
#### Whisper
|
||
|
||
[Whisper](https://github.com/openai/whisper) is an open-ended speech model.
|
||
|
||
- It will try to transcribe everything.
|
||
- The cost is slower processing speed:
|
||
- On a Raspberry Pi 4, it takes around 8 seconds to process incoming voice commands.
|
||
- On an Intel NUC, it is done in under a second.
|
||
- Supports [various languages](https://github.com/openai/whisper#available-models-and-languages).
|
||
- Whisper is only a great option in the following case:
|
||
1. You have powerful hardware at home.
|
||
2. You plan to extend your voice set-up beyond simple home control. For example, by pairing your assistant with an LLM-based agent.
|
||
|
||
### Text-to-speech engine
|
||
|
||
For text-to-speech, we have developed [Piper](https://github.com/rhasspy/piper). Piper is a fast, local neural text-to-speech system that sounds great and is optimized for the Raspberry Pi 4. It supports [many languages](https://rhasspy.github.io/piper-samples/). On a Raspberry Pi, using medium quality models, it can generate 1.6s of voice in a second.
|
||
|
||
Please be sure to check how either option will work in your language, since quality can change quite a bit.
|
||
|
||
## Installing a local Assist pipeline
|
||
|
||
For the quickest way to get your local Assist pipeline started, follow these steps:
|
||
|
||
1. Install the add-ons to convert text into speech and vice versa.
|
||
- Install the speech-to-text add-on of your choice, either {% my supervisor_addon addon="core_speech-to-phrase" title="**Speech-to-Phrase**" %} or {% my supervisor_addon addon="core_whisper" title="**Whisper**" %}.
|
||
- Install {% my supervisor_addon addon="core_piper" title="**Piper**" %} for text-to-speech.
|
||
- Start the add-ons.
|
||
- Once the add-ons are started, head over to the integrations under {% my integrations title="**Settings** > **Devices & Services**" %}.
|
||
- You should now see both services being discovered by the [Wyoming integration](/integrations/wyoming/).
|
||

|
||
- For each integration, select **Add**.
|
||
- You now have integrated a local speech-to-text engine of your choice (either {% my supervisor_addon addon="core_speech-to-phrase" title="**Speech-to-Phrase**" %} or {% my supervisor_addon addon="core_whisper" title="**Whisper**" %}) and a text-to-speech engine ({% my supervisor_addon addon="core_piper" title="**Piper**" %}).
|
||
|
||
2. Setup your assistant.
|
||
|
||
- Go to {% my voice_assistants title="**Settings** > **Voice assistants**" %} and select **Add assistant**.
|
||

|
||
|
||
- **Troubleshooting**: If you do not see any assistants here, you are not using the default configuration. In this case, you need to add the following to your {% term "configuration.yaml" %} file:
|
||
|
||
```yaml
|
||
# Example configuration.yaml entry
|
||
assist_pipeline:
|
||
```
|
||
|
||
- Enter a name. You can pick any name that is meaningful to you.
|
||
- Select the language that you want to speak.
|
||
- Under **Conversation agent**, select **Home Assistant**.
|
||
- Under **Speech-to-text**, select the speech-to-text engine you choose in the previous step (either **Whisper** or **Speech-to-Phrase**). Select the language.
|
||
- Under **Text-to-speech**, select **Piper**. Select the language.
|
||
- Depending on your language, you may be able to select different language variants.
|
||
|
||
3. That's it. You ensured your voice commands can be processed locally on your device.
|
||
4. If you haven't done so yet, [expose your devices to Assist](/voice_control/voice_remote_expose_devices/#exposing-your-devices).
|
||
- Otherwise you won't be able to control them by voice.
|
||
|
||
## Fine-tuning Whisper and Piper for your setup
|
||
|
||
You would like to tweak the configuration for better performance or accuracy?
|
||
View some of the options in the video below. Explained by Mike Hansen, creator of Rhasspy, Piper, and Wyoming.
|
||
|
||
<lite-youtube videoid="Tk-pnm7FY7c" videoStartAt="1589" videotitle="Configure your local Assist pipeline for your setup"></lite-youtube>
|
||
|
||
The options are also documented in the add-on itself. Go to the {% my supervisor_addon addon="core_whisper" title="**Whisper**" %} or the {% my supervisor_addon addon="core_piper" title="**Piper**" %} add-on and open the **Documentation** page.
|
||
|
||
Also be sure to check the specific tutorial for [using Piper in Automations](/voice_control/using_tts_in_automation/)
|
||
|
||
## Learning more about Speech-to-Phrase
|
||
|
||
You can check out [Voice Chapter 9](/blog/2025/02/13/voice-chapter-9-speech-to-phrase/) to learn more about why we introduced Speech-to-Phrase, and why it's a great option for home control.
|
||
|
||
<lite-youtube videoid="k6VvzDSI8RU" videotitle="Voice Chapter 9"></lite-youtube>
|
||
|
||
## Next steps
|
||
Once Assist is configured, now can now start using it. You can now talk through your device ([Android](/voice_control/android/), [iOS](/voice_control/apple/) or [Voice Preview edition](https://support.nabucasa.com/hc/en-us/categories/24451727188125-Home-Assistant-Voice-Preview-Edition)).
|
||
|
||
To get the best out of the voice interaction, don't forget to check the [best practices](/voice_control/best_practices/).
|