Add voice blog (#27172)

* Add voice blog

* Cursive

* Link assist

* Add affil text

* Apply suggestions from code review

Co-authored-by: Franck Nijhof <git@frenck.dev>

---------

Co-authored-by: Franck Nijhof <git@frenck.dev>
This commit is contained in:
Paulus Schoutsen 2023-04-27 16:32:36 -04:00 committed by GitHub
parent c6328ca499
commit 91c947d673
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
7 changed files with 303 additions and 0 deletions

View File

@ -0,0 +1,162 @@
---
layout: post
title: "Year of the Voice - Chapter 2: Let's talk"
description: "Talk to your smart home and let it talk back with our new voice assistant features."
date: 2023-04-27 00:00:00
date_formatted: "April 27, 2023"
author: Paulus Schoutsen
comments: true
categories: Assist
og_image: /images/blog/2023-04-27-year-of-the-voice-chapter-2/social.png
---
This year is Home Assistants [Year of the Voice]. It is our goal for 2023 to let users control Home Assistant in their own language. Today were presenting Chapter 2, our second milestone in building towards this goal.
In [Chapter 1], we focused on intents what the user wants to do. Today, the Home Assistant community has translated common smart home commands and responses into [45 languages], closing in on the 62 languages that Home Assistant supports.
For Chapter 2, weve expanded beyond text to now include audio; specifically, turning audio (speech) into text, and text back into speech. With this functionality, [Home Assistants Assist feature][assist] is now able to provide a full voice interface for users to interact with.
A voice assistant also needs hardware, so today were launching ESPHome support for Assist and; to top it off: were launching the Worlds Most Private Voice Assistant. Keep reading to see what that entails.
_To watch the video presentation of this blog post, including live demos, check [the recording of our live stream.][live-stream]_
<lite-youtube videoid="0YJzLIMrnGk" videotitle="Worlds Most Private Voice Assistant"></lite-youtube>
[Year of the Voice]: https://www.home-assistant.io/blog/2022/12/20/year-of-voice/
[Chapter 1]: https://www.home-assistant.io/blog/2023/01/26/year-of-the-voice-chapter-1/
[45 languages]: https://home-assistant.github.io/intents/
[live-stream]: https://youtube.com/live/Tk-pnm7FY7c?feature=share
[assist]: /docs/assist/
<!--more-->
## Composing Voice Assistants
The new [Assist Pipeline integration] allows you to configure all components that make up a voice assistant in a single place.
For voice commands, pipelines start with audio. A speech-to-text system determines the words the user speaks, which are then forwarded to a conversation agent. The intent is extracted from the text by the agent and executed by Home Assistant. At this point, “turn on the light” would cause your light to turn on 💡. The last part of the pipeline is text-to-speech, where the agents response is spoken back to you. This may be a simple confirmation (“Turned on light”) or the answer to a question, such as “Which lights are on?”
<p class='img'>
<img src='/images/blog/2023-04-27-year-of-the-voice-chapter-2/assist-config.png'>
Screenshot of the new Assist configuration in Home Assistant.
</p>
With the new Voice Assistant settings page users can create multiple assistants, mixing and matching voice services. Want a U.S. English assistant that responds with a British accent? No problem. What about a second assistant that listens for Dutch, German, or French voice commands? Or maybe you want to throw ChatGPT in the mix. Create as many assistants as you want, and use them from the [Assist dialog] as well as voice assistant hardware for Home Assistant.
Interacting with many different services means that many different things can go wrong. To help users figure out what went wrong, weve built extensive debug tooling for voice assistants into Home Assistant. You can always inspect the last 10 interactions per voice assistant.
<p class='img'>
<img src='/images/blog/2023-04-27-year-of-the-voice-chapter-2/assist-debug.png'>
Screenshot of the new Assist debug tool.
</p>
[Assist Pipeline integration]: https://next.home-assistant.io/integrations/assist_pipeline/
[Assist dialog]: /docs/assist/
## Voice Assistant powered by Home Assistant Cloud
The [Home Assistant Cloud][nc] subscription, besides end-to-end encrypted remote connection, includes state of the art speech-to-text and text-to-speech services. This allows your voice assistant to speak 130+ languages (including dialects like Peruvian Spanish) and is extremely fast to respond. Sample:
<audio preload controls src="/images/assist/ha_cloud.mp3"></audio>
As a subscriber, you can directly start using voice in Home Assistant. You will not need any extra hardware or software to get started.
In addition to high quality speech-to-text and text-to-speech for your voice assistants, you will also be supporting the development of Home Assistant itself.
[Join Home Assistant Cloud today][nc]
[nc]: https://www.nabucasa.com
## The fully local voice assistant
With Home Assistant you can be guaranteed two things: there will be options and one of those options will be local. With our voice assistant thats no different.
### Piper: our new model for high quality local text-to-speech
To make quality text-to-speech running locally possible, weve had to create our own text-to-speech system that is optimized for running on a Raspberry Pi 4. Its called Piper.
<img style='width: 100%' src='/images/assist/piper-logo.svg' alt='Piper logo' class='no-shadow'>
Piper uses [modern machine learning algorithms][mm-algo] for realistic-sounding speech but can still generate audio quickly. On a Raspberry Pi 4, Piper can generate 2 seconds of audio with only 1 second of processing time. More powerful CPUs, such as the Intel Core i5, can generate 17 seconds of audio in the same amount of time. Sample:
<audio preload controls src="/images/assist/piper.wav"></audio>
_For more samples, see [the Piper website][piper-samples]_
An {% my supervisor_addon addon="core_piper" title="add-on with Piper" %} is available now for Home Assistant with [over 40 voices across 18 languages][piper-samples], including: Catalan, Danish, German, English, Spanish, Finnish, French, Greek, Italian, Kazakh, Nepali, Dutch, Norwegian, Polish, Brazilian Portuguese, Ukrainian, Vietnamese, and Chinese. Voices for Piper are trained from [open audio datasets][open-audio], many of which come from [free audiobooks read by volunteers][audiobook]. If youre interested in contributing your voice, [let us know!][contact]
[mm-algo]: https://github.com/jaywalnut310/vits/
[piper-samples]: https://rhasspy.github.io/piper-samples
[open-audio]: http://www.openslr.org/
[audiobook]: https://librivox.org/
[contact]: mailto:voice@nabucasa.com
### Local speech-to-text with OpenAI Whisper
[Whisper] is an open source speech-to-text model created by OpenAI that runs locally. Since its release in 2022, Whisper has been improved by the open source community to run on less powerful hardware by projects such as [whisper.cpp] and [faster-whisper]. In less than a year of progress, Whisper is now capable of providing speech-to-text for [dozens of languages][whisper-lang] on small servers and single-board computers!
An {% my supervisor_addon addon="core_whisper" title="add-on using faster-whisper" %} is available now for Home Assistant. On a Raspberry Pi 4, voice commands can take around 7 seconds to process with about 200 MB of RAM used. An Intel Core i5 CPU or better is capable of sub-second response times and can run larger (and more accurate) versions of Whisper.
[Whisper]: https://github.com/openai/whisper
[whisper-lang]: https://github.com/openai/whisper#available-models-and-languages
[whisper.cpp]: https://github.com/ggerganov/whisper.cpp
[faster-whisper]: https://github.com/guillaumekln/faster-whisper/
## Wyoming: the voice assistant glue
Voice assistants share many common functions, such as speech-to-text, intent-recognition, and text-to-speech. We created the [Wyoming protocol][Wyoming] to provide a small set of standard messages for talking to voice assistant services, including the ability to stream audio.
Wyoming allows developers to focus on the core of a voice service without having to commit to a specific networking stack like HTTP or MQTT. This protocol is compatible with the upcoming [version 3.0 of Rhasspy][Rhasspy], so both projects can share voice services.
With Wyoming, were trying to kickstart a more interoperable open voice ecosystem that makes sharing components across projects and platforms easy. Developers and scientists wishing to experiment with new voice technologies need only implement a small set of messages to integrate with other voice assistant projects.
The Whisper and Piper add-ons mentioned above are integrated into Home Assistant via the new [Wyoming integration]. Wyoming services can also be run on other machines and still integrate into Home Assistant.
[Wyoming]: https://github.com/rhasspy/rhasspy3/blob/master/docs/wyoming.md
[Rhasspy]: https://github.com/rhasspy/rhasspy3/
[Wyoming integration]: https://next.home-assistant.io/integrations/wyoming/
## ESPHome powered voice assistants
[ESPHome] is our software for microcontrollers. Instead of programming, users define how their sensors are connected in a YAML file. ESPHome will read this file and generate and install software on your microcontroller to make this data accessible in Home Assistant.
Today were launching support for building voice assistants using ESPHome. Connect a microphone to your ESPHome device, and you can control your smart home with your voice. Include a speaker and the smart home will speak back.
<lite-youtube videoid="w6QxGdxVMJs" videotitle="$13 voice remote for Home Assistant"></lite-youtube>
Weve been focusing on the [M5STACK ATOM Echo][atom-echo] for testing and development. For $13 it comes with a microphone and a speaker in a nice little box. Weve created a tutorial to turn this device into a voice remote directly from your browser!
[Tutorial: create a $13 voice remote for Home Assistant.](https://next.home-assistant.io/projects/thirteen-usd-voice-remote/)
[ESPHome Voice Assistant documentation.](https://esphome.io/components/voice_assistant.html)
[ESPHome]: https://esphome.io
[atom-echo]: https://shop.m5stack.com/products/atom-echo-smart-speaker-dev-kit?ref=NabuCasa
## Worlds Most Private Voice Assistant
If you were designing the worlds most private voice assistant, what features would it have? To start, it should only listen when youre ready to talk, rather than all the time. And when it responds, you should be the only one to hear it. This sounds strangely familiar…🤔
A phone! No, not the featureless rectangle you have in your pocket; an analog phone. These great creatures once ruled the Earth with twisty cords and unique looks to match your style. Analog phones have a familiar interface thats hard to beat: pick up the phone to listen/speak and put it down when done.
With Home Assistants new [Voice-over-IP integration][voip], you can now use an “old school” phone to control your smart home!
<lite-youtube videoid="0YJzLIMrnGk" videotitle="Worlds Most Private Voice Assistant"></lite-youtube>
By configuring off-hook autodial, your phone will automatically call Home Assistant when you pick it up. Speak your voice command or question, and listen for the response. The conversation will continue as long as you please: speak more commands/questions, or simply hang up. Assign a unique voice assistant/pipeline to each VoIP adapter, enabling dedicated phones for specific languages.
Weve focused our initial efforts on supporting [the Grandstream HT801 Voice-over-IP box][ht801]. It works with any phone with an RJ11 connector, and connects directly to Home Assistant. There is no need for an extra server.
[Tutorial: create your own Worlds Most Private Voice Assistant](https://next.home-assistant.io/projects/worlds-most-private-voice-assistant/)
<p class='img'>
<lite-youtube videoid="eLx8_NAqptk" videotitle="Worlds Most Private Voice Assistant meets ChatGPT"></lite-youtube>
Give your voice assistant personality using the OpenAI integration.
</p>
[voip]: https://next.home-assistant.io/integrations/voip/
[ht801]: https://amzn.to/40k7mRa
_Some links on this page are affiliate links and purchases using these links support the Home Assistant project._

Binary file not shown.

View File

@ -0,0 +1,141 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:cc="http://creativecommons.org/ns#"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:svg="http://www.w3.org/2000/svg"
xmlns="http://www.w3.org/2000/svg"
xmlns:sodipodi="http://sodipodi.sourceforge.net/DTD/sodipodi-0.dtd"
xmlns:inkscape="http://www.inkscape.org/namespaces/inkscape"
width="118.3606mm"
height="41.577671mm"
viewBox="0 0 118.36058 41.577671"
version="1.1"
id="svg120"
inkscape:version="1.0.2 (e86c870879, 2021-01-15)"
sodipodi:docname="logo.svg"
inkscape:export-filename="./logo.png"
inkscape:export-xdpi="100"
inkscape:export-ydpi="100">
<defs
id="defs114" />
<sodipodi:namedview
id="base"
pagecolor="#ffffff"
bordercolor="#666666"
borderopacity="1.0"
inkscape:pageopacity="1"
inkscape:pageshadow="2"
inkscape:zoom="2"
inkscape:cx="45.74006"
inkscape:cy="103.62972"
inkscape:document-units="mm"
inkscape:current-layer="layer1"
inkscape:document-rotation="0"
showgrid="false"
inkscape:window-width="1920"
inkscape:window-height="1012"
inkscape:window-x="0"
inkscape:window-y="0"
inkscape:window-maximized="1"
fit-margin-top="2"
fit-margin-left="2"
fit-margin-right="2"
fit-margin-bottom="2"
inkscape:snap-global="false" />
<metadata
id="metadata117">
<rdf:RDF>
<cc:Work
rdf:about="">
<dc:format>image/svg+xml</dc:format>
<dc:type
rdf:resource="http://purl.org/dc/dcmitype/StillImage" />
<dc:title></dc:title>
</cc:Work>
</rdf:RDF>
</metadata>
<g
inkscape:label="Layer 1"
inkscape:groupmode="layer"
id="layer1"
transform="translate(-46.653036,-127.37783)">
<path
d="m 71.560318,141.48698 c 2.276589,0.19412 4.076186,0.84372 5.39879,1.94879 1.615007,1.33 2.42251,3.23001 2.42251,5.70002 0,2.48057 -0.807503,4.39113 -2.42251,5.73169 -1.604451,1.33001 -3.910849,1.99501 -6.919195,1.99501 H 66.01823 v 8.2017 h -6.095858 v -16.59136 c 3.797355,-2.67672 7.653774,-5.10258 11.637946,-6.98585 z m -5.542088,4.35546 v 6.60253 h 3.372514 c 1.182227,0 2.095286,-0.285 2.739178,-0.855 0.643891,-0.58056 0.965837,-1.39862 0.965837,-2.45418 0,-1.05556 -0.321946,-1.86834 -0.965837,-2.43834 -0.643892,-0.57001 -1.556951,-0.85501 -2.739178,-0.85501 z"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:32.4268px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-variant-east-asian:normal;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
id="path2223"
sodipodi:nodetypes="ccscscccccccscsssc" />
<path
d="m 95.75335,141.42493 h 10.11754 q 4.51252,0 6.9192,2.01084 2.42251,1.995 2.42251,5.70002 0,3.72085 -2.42251,5.73169 -2.40668,1.99501 -6.9192,1.99501 h -4.02168 v 8.2017 h -6.09586 z m 6.09586,4.41751 v 6.60253 h 3.37251 q 1.77334,0 2.73918,-0.855 0.96584,-0.87084 0.96584,-2.45418 0,-1.58334 -0.96584,-2.43834 -0.96584,-0.85501 -2.73918,-0.85501 z"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:32.4268px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-variant-east-asian:normal;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
id="path2225" />
<path
d="m 119.51928,141.42493 h 16.4509 v 4.60751 h -10.35504 v 4.40169 h 9.73754 v 4.60752 h -9.73754 v 5.41502 h 10.70338 v 4.60752 h -16.79924 z"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:32.4268px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-variant-east-asian:normal;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
id="path2227" />
<path
d="m 150.33107,151.90663 q 1.91585,0 2.73918,-0.7125 0.83917,-0.7125 0.83917,-2.34334 0,-1.61501 -0.83917,-2.31168 -0.82333,-0.69667 -2.73918,-0.69667 h -2.56501 v 6.06419 z m -2.56501,4.21169 v 8.94587 h -6.09585 v -23.63926 h 9.31003 q 4.67086,0 6.84003,1.5675 2.18501,1.56751 2.18501,4.95586 0,2.34334 -1.14,3.84751 -1.12417,1.50417 -3.40418,2.21668 1.25083,0.285 2.2325,1.29834 0.99751,0.9975 2.01085,3.04001 l 3.30918,6.71336 h -6.4917 L 153.64025,159.19 q -0.87083,-1.77334 -1.77334,-2.42251 -0.88667,-0.64917 -2.37501,-0.64917 z"
style="font-style:normal;font-variant:normal;font-weight:bold;font-stretch:normal;font-size:32.4268px;line-height:1.25;font-family:sans-serif;-inkscape-font-specification:'sans-serif, Bold';font-variant-ligatures:normal;font-variant-caps:normal;font-variant-numeric:normal;font-variant-east-asian:normal;fill:#000000;fill-opacity:1;stroke:none;stroke-width:0.264583"
id="path2229" />
<g
id="g2239"
transform="translate(42.041608,11.459915)">
<path
style="fill:#000000;stroke:none;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 27.894254,129.39607 c -3.25161,-4.01384 -2.65489,-6.79285 -0.728638,-9.09261 -2.134965,-0.0137 -4.642444,-0.12021 -6.370534,4.67585 -4.134299,0.0803 -4.437171,-3.11951 -4.48854,-6.20362 -1.859141,2.96638 -2.878913,5.02914 -1.495979,9.34664 -4.921996,-1.38523 -5.5668734,2.41507 -7.6020931,4.32371 3.9580871,-2.05625 8.8579831,-2.84843 10.5409231,3.09271 2.800323,-2.02787 5.8308,-4.05685 10.168201,-6.09213 0.01737,-0.008 -0.01127,-0.0357 -0.02334,-0.0505 z"
id="path2231"
sodipodi:nodetypes="sccccccsss" />
<circle
style="fill:#000000;stroke:none;stroke-width:0.1;stroke-linecap:round"
id="circle2233"
cx="7.4886017"
cy="132.36996"
r="0.87717384" />
<circle
style="fill:#000000;stroke:none;stroke-width:0.1;stroke-linecap:round"
id="circle2235"
cx="16.220198"
cy="118.79509"
r="0.87717384" />
<circle
style="fill:#000000;stroke:none;stroke-width:0.1;stroke-linecap:round"
id="circle2237"
cx="26.696749"
cy="120.39225"
r="0.87717384" />
</g>
<g
id="g235">
<path
id="rect185"
style="opacity:1;fill:#000000;stroke:none;stroke-width:0.25;stroke-linecap:round"
d="m 83.615347,147.24968 c 0.360734,-0.12828 0.577533,-0.20074 1.253964,-0.36863 l 6.43e-4,9.22145 h -1.253759 z"
sodipodi:nodetypes="ccccc" />
<path
id="path188"
style="opacity:1;fill:#000000;stroke:none;stroke-width:0.25;stroke-linecap:round"
d="m 89.709902,147.2435 c -0.360734,-0.12828 -0.597295,-0.20435 -1.302474,-0.36245 l 0.0024,18.18238 h 1.301301 z"
sodipodi:nodetypes="ccccc" />
<path
id="rect190"
style="opacity:1;fill:#000000;stroke:none;stroke-width:0.25;stroke-linecap:round"
d="m 85.866016,146.73318 c 0.511315,-0.051 1.016416,-0.0516 1.545684,-3.2e-4 l 0.001,12.16841 h -1.544594 z"
sodipodi:nodetypes="ccccc" />
<path
style="opacity:1;fill:#000000;stroke:none;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 83.616794,145.83474 c 0.411637,-0.13887 0.826403,-0.26798 1.252569,-0.36151 l 0.0015,-4.96632 c -0.386445,0.10825 -1.112359,0.30313 -1.253868,0.82727 z"
id="path193"
sodipodi:nodetypes="ccccc" />
<path
style="opacity:1;fill:#000000;stroke:none;stroke-width:0.264583px;stroke-linecap:butt;stroke-linejoin:miter;stroke-opacity:1"
d="m 89.712432,145.76508 c -0.433673,-0.15205 -0.85495,-0.27423 -1.303915,-0.36776 l -5.66e-4,-4.93015 c 0.303265,0.0723 1.230053,0.28565 1.305315,0.91864 z"
id="path195"
sodipodi:nodetypes="ccccc" />
<path
id="rect197"
style="opacity:1;fill:#000000;stroke:none;stroke-width:0.25;stroke-linecap:round"
d="m 85.868347,140.33989 c 0.515486,-0.0514 1.169823,-0.0536 1.542039,-0.0143 l 0.0013,4.94152 c -0.554246,-0.0467 -1.054243,-0.0116 -1.545581,0.0365 z"
sodipodi:nodetypes="ccccc" />
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 8.3 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 6.0 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 16 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 531 KiB