home-assistant.io/source/_integrations/google_cloud.markdown

7.0 KiB

title description ha_category ha_release ha_iot_class ha_codeowners ha_domain ha_platforms ha_integration_type
Google Cloud Platform Google Cloud Platform integration.
Text-to-speech
0.95 Cloud Push
@lufton
google_cloud
tts
integration

The google_cloud platform allows you to use Google Cloud Platform API and integrate them into Home Assistant.

Configuration

To use Google Cloud Platform, you need to provide config directory relative path of API key file you are going to use. Place it under config folder and set key_file parameter in configuration.yaml:

# Example configuration.yaml entry
tts:
  - platform: google_cloud
    key_file: googlecloud.json

Obtaining an API key

API key obtaining process described in corresponding documentation:

Basic instruction for all APIs:

  1. Visit Cloud Resource Manager.

  2. Click CREATE PROJECT button at the top.

  3. Specify convenient Project name and click CREATE button.

  4. Make sure that billing is enabled for your Google Cloud Platform project.

  5. Enable needed Cloud API visiting one of the links below or APIs library, selecting your Project from the dropdown list and clicking the Continue button:

  6. Set up authentication:

    1. Visit this link
    2. From the Service account list, select New service account.
    3. In the Service account name field, enter any name.

    If you are requesting a text-to-speech API key:

    1. Don't select a value from the Role list. No role is required to access this service.
    2. Click Create. A note appears, warning that this service account has no role.
    3. Click Create without role. A JSON file that contains your API key downloads to your computer.

Google Cloud text-to-speech

Google Cloud text-to-speech converts text into human-like speech in more than 100 voices across 20+ languages and variants. It applies groundbreaking research in speech synthesis (WaveNet) and Google's powerful neural networks to deliver high-fidelity audio. With this easy-to-use API, you can create lifelike interactions with your users that transform customer service, device interaction, and other applications.

Pricing

The Cloud text-to-speech API is priced monthly based on the amount of characters to synthesize into audio sent to the service.

Feature Monthly free tier Paid usage
Standard (non-WaveNet) voices 0 to 4 million characters $4.00 USD / 1 million characters
WaveNet voices 0 to 1 million characters $16.00 USD / 1 million characters

Text-to-speech configuration

{% configuration %} key_file: description: "The API key file to use with Google Cloud Platform. If not specified os.environ['GOOGLE_APPLICATION_CREDENTIALS'] path will be used." required: false type: string language: description: "Default language of the voice, e.g., en-US. Supported languages, genders and voices listed here. Also there are extra not documented but supported languages (see dropdown here)." required: false type: string default: en-US gender: description: "Default gender of the voice, e.g., male. Supported languages, genders and voices listed here." required: false type: string default: neutral voice: description: "Default voice name, e.g., en-US-Wavenet-F. Supported languages, genders and voices listed here. Important! This parameter will override language and gender parameters if set." required: false type: string encoding: description: "Default audio encoder. Supported encodings are ogg_opus, mp3 and linear16." required: false type: string default: mp3 speed: description: "Default rate/speed of the voice, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. If unset(0.0), defaults to the native 1.0 speed." required: false type: float default: 1.0 pitch: description: "Default pitch of the voice, in the range [-20.0, 20.0]. 20 means increase of 20 semitones from the original pitch. -20 means decrease of 20 semitones from the original pitch." required: false type: float default: 0.0 gain: description: "Default volume gain (in dB) of the voice, in the range [-96.0, 16.0]. If unset, or set to a value of 0.0 (dB), will play at normal native signal amplitude. A value of -6.0 (dB) will play at approximately half the amplitude of the normal native signal amplitude. A value of +6.0 (dB) will play at approximately twice the amplitude of the normal native signal amplitude. Strongly recommend not to exceed +10 (dB) as there's usually no effective increase in loudness for any value greater than that." required: false type: float default: 0.0 profiles: description: "An identifier which selects 'audio effects' profiles that are applied on (post synthesized) text-to-speech. Effects are applied on top of each other in the order they are given. Supported profile ids listed here." required: false type: list default: "[]" text_type: description: "Default text type. Supported text types are text and ssml. Read more on what is that and how to use SSML here." required: false type: string default: "text" {% endconfiguration %}

Full configuration example

The Google Cloud text-to-speech configuration can look like:

# Example configuration.yaml entry
tts:
  - platform: google_cloud
    key_file: googlecloud.json
    language: en-US
    gender: male
    voice: en-US-Wavenet-F
    encoding: linear16
    speed: 0.9
    pitch: -2.5
    gain: -5.0
    text_type: ssml
    profiles:
      - telephony-class-application
      - wearable-class-device