diff --git a/source/_integrations/scrape.markdown b/source/_integrations/scrape.markdown index 1d6c0744e3e..a4529bc49cb 100644 --- a/source/_integrations/scrape.markdown +++ b/source/_integrations/scrape.markdown @@ -5,8 +5,10 @@ ha_category: - Sensor ha_release: 0.31 ha_iot_class: Cloud Polling +ha_config_flow: true ha_codeowners: - '@fabaff' + - '@gjohansson-ST' ha_domain: scrape ha_platforms: - sensor @@ -15,85 +17,15 @@ ha_integration_type: integration The `scrape` sensor platform is scraping information from websites. The sensor loads an HTML page and gives you the option to search and split out a value. As this is not a full-blown web scraper like [scrapy](https://scrapy.org/), it will most likely only work with simple web pages and it can be time-consuming to get the right section. +Check Beautifulsoup's [CSS selectors](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) for details on how to write a **Select**. + If you are not using Home Assistant Container or Home Assistant Operating System, this integration requires `libxml2` to be installed. On Debian based installs, run: ```bash sudo apt install libxml2 ``` -To enable this sensor, add the following lines to your `configuration.yaml` file: - -```yaml -# Example configuration.yaml entry -sensor: - - platform: scrape - resource: https://www.home-assistant.io - select: ".current-version h1" -``` - -{% configuration %} -resource: - description: The URL to the website that contains the value. - required: true - type: string -select: - description: "Defines the HTML tag to search for. Check Beautifulsoup's [CSS selectors](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) for details." - required: true - type: string -attribute: - description: Get value of an attribute on the selected tag. - required: false - type: string -index: - description: Defines which of the elements returned by the CSS selector to use. - required: false - default: 0 - type: integer -name: - description: Name of the sensor. - required: false - default: Web scrape - type: string -value_template: - description: Defines a template to get the state of the sensor. - required: false - type: template -unit_of_measurement: - description: Defines the units of measurement of the sensor, if any. - required: false - type: string -device_class: - description: The [type/class](/integrations/sensor/#device-class) of the sensor to set the icon in the frontend. - required: false - type: device_class - default: None -state_class: - description: The [state_class](https://developers.home-assistant.io/docs/core/entity/sensor#available-state-classes) of the sensor. - required: false - type: string - default: None -authentication: - description: Type of the HTTP authentication. Either `basic` or `digest`. - required: false - type: string -verify_ssl: - description: Enables/disables verification of SSL-certificate, for example if it is self-signed. - required: false - type: boolean - default: true -username: - description: The username for accessing the website. - required: false - type: string -password: - description: The password for accessing the website. - required: false - type: string -headers: - description: Headers to use for the web request. - required: false - type: string -{% endconfiguration %} +{% include integrations/config_flow.md %} ## Examples @@ -103,97 +35,67 @@ In this section you find some real-life examples of how to use this sensor. Ther The current release Home Assistant is published on [https://www.home-assistant.io/](/) -{% raw %} - -```yaml -sensor: -# Example configuration.yaml entry - - platform: scrape - resource: https://www.home-assistant.io - name: Release - select: ".current-version h1" - value_template: '{{ value.split(":")[1] }}' -``` - -{% endraw %} +| Field | Value | +| --- | --- | +| **Resource** | https://www.home-assistant.io | +| **Name** | Release | +| **Select** | `.current-version h1` | +| **Value Template** | {% raw %}`{{ value.split(':')[1] }}`{% endraw %} | ### Available implementations Get the counter for all our implementations from the [Component overview](/integrations/) page. -{% raw %} - -```yaml -# Example configuration.yaml entry -sensor: - - platform: scrape - resource: https://www.home-assistant.io/integrations/ - name: Home Assistant impl. - select: 'a[href="#all"]' - value_template: '{{ value.split("(")[1].split(")")[0] }}' -``` - -{% endraw %} +| Field | Value | +| --- | --- | +| **Resource** | https://www.home-assistant.io/integrations/ | +| **Name** | Home Assistant impl. | +| **Select** | `a[href="#all"]` | +| **Value Template** | {% raw %}`{{ value.split('(')[1].split(')')[0] }}`{% endraw %} | ### Get a value out of a tag The German [Federal Office for Radiation protection (Bundesamt für Strahlenschutz)](http://www.bfs.de/) is publishing various details about optical radiation including an UV index. This example is getting the index for a region in Germany. -```yaml -# Example configuration.yaml entry -sensor: - - platform: scrape - resource: http://www.bfs.de/DE/themen/opt/uv/uv-index/prognose/prognose_node.html - name: Coast Ostsee - select: "p" - index: 19 - unit_of_measurement: "UV Index" -``` +| Field | Value | +| --- | --- | +| **Resource** | http://www.bfs.de/DE/themen/opt/uv/uv-index/prognose/prognose_node.html | +| **Name** | Coast Ostsee | +| **Select** | `p` | +| **Index** | `19` | +| **Unit of Measurement** | `UV Index` | ### IFTTT status If you make heavy use of the [IFTTT](/integrations/ifttt/) web service for your automations and are curious about the [status of IFTTT](https://status.ifttt.com/) then you can display the current state of IFTTT in your frontend. -```yaml -# Example configuration.yaml entry -sensor: - - platform: scrape - resource: https://status.ifttt.com/ - name: IFTTT status - select: ".component-status" -``` +| Field | Value | +| --- | --- | +| **Resource** | https://status.ifttt.com/ | +| **Name** | IFTTT status | +| **Select** | `.component-status` | ### Get the latest podcast episode file URL If you want to get the file URL for the latest episode of your [favorite podcast](https://hasspodcast.io/), so you can pass it on to a compatible media player. -```yaml -# Example configuration.yaml entry -sensor: - - platform: scrape - resource: https://hasspodcast.io/feed/podcast - name: Home Assistant Podcast - select: "enclosure" - index: 1 - attribute: url -``` +| Field | Value | +| --- | --- | +| **Resource** | https://hasspodcast.io/feed/podcast | +| **Name** | Home Assistant Podcast | +| **Select** | `enclosure` | +| **Index** | `1` | +| **Attribute** | `url` | ### Energy price This example tries to retrieve the price for electricity. -{% raw %} - -```yaml -# Example configuration.yaml entry -sensor: - - platform: scrape - resource: https://elen.nu/timpriser-pa-el-for-elomrade-se3-stockholm/ - name: Electricity price - select: ".text-lg:is(span)" - index: 1 - value_template: '{{ value | replace (",", ".") | float }}' - unit_of_measurement: "öre/kWh" -``` - -{% endraw %} +| Field | Value | +| --- | --- | +| **Resource** | https://elen.nu/timpriser-pa-el-for-elomrade-se3-stockholm/ | +| **Name** | Electricity price | +| **Select** | `.text-lg:is(span)` | +| **Index** | `1` | +| **Value Template** | {% raw %}`{{ value \| replace(',', '.') \| float }}`{% endraw %} | +| **Unit of Measurement** | `öre/kWh` |