diff --git a/source/_integrations/scrape.markdown b/source/_integrations/scrape.markdown index a4529bc49cb..1d6c0744e3e 100644 --- a/source/_integrations/scrape.markdown +++ b/source/_integrations/scrape.markdown @@ -5,10 +5,8 @@ ha_category: - Sensor ha_release: 0.31 ha_iot_class: Cloud Polling -ha_config_flow: true ha_codeowners: - '@fabaff' - - '@gjohansson-ST' ha_domain: scrape ha_platforms: - sensor @@ -17,15 +15,85 @@ ha_integration_type: integration The `scrape` sensor platform is scraping information from websites. The sensor loads an HTML page and gives you the option to search and split out a value. As this is not a full-blown web scraper like [scrapy](https://scrapy.org/), it will most likely only work with simple web pages and it can be time-consuming to get the right section. -Check Beautifulsoup's [CSS selectors](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) for details on how to write a **Select**. - If you are not using Home Assistant Container or Home Assistant Operating System, this integration requires `libxml2` to be installed. On Debian based installs, run: ```bash sudo apt install libxml2 ``` -{% include integrations/config_flow.md %} +To enable this sensor, add the following lines to your `configuration.yaml` file: + +```yaml +# Example configuration.yaml entry +sensor: + - platform: scrape + resource: https://www.home-assistant.io + select: ".current-version h1" +``` + +{% configuration %} +resource: + description: The URL to the website that contains the value. + required: true + type: string +select: + description: "Defines the HTML tag to search for. Check Beautifulsoup's [CSS selectors](https://www.crummy.com/software/BeautifulSoup/bs4/doc/#css-selectors) for details." + required: true + type: string +attribute: + description: Get value of an attribute on the selected tag. + required: false + type: string +index: + description: Defines which of the elements returned by the CSS selector to use. + required: false + default: 0 + type: integer +name: + description: Name of the sensor. + required: false + default: Web scrape + type: string +value_template: + description: Defines a template to get the state of the sensor. + required: false + type: template +unit_of_measurement: + description: Defines the units of measurement of the sensor, if any. + required: false + type: string +device_class: + description: The [type/class](/integrations/sensor/#device-class) of the sensor to set the icon in the frontend. + required: false + type: device_class + default: None +state_class: + description: The [state_class](https://developers.home-assistant.io/docs/core/entity/sensor#available-state-classes) of the sensor. + required: false + type: string + default: None +authentication: + description: Type of the HTTP authentication. Either `basic` or `digest`. + required: false + type: string +verify_ssl: + description: Enables/disables verification of SSL-certificate, for example if it is self-signed. + required: false + type: boolean + default: true +username: + description: The username for accessing the website. + required: false + type: string +password: + description: The password for accessing the website. + required: false + type: string +headers: + description: Headers to use for the web request. + required: false + type: string +{% endconfiguration %} ## Examples @@ -35,67 +103,97 @@ In this section you find some real-life examples of how to use this sensor. Ther The current release Home Assistant is published on [https://www.home-assistant.io/](/) -| Field | Value | -| --- | --- | -| **Resource** | https://www.home-assistant.io | -| **Name** | Release | -| **Select** | `.current-version h1` | -| **Value Template** | {% raw %}`{{ value.split(':')[1] }}`{% endraw %} | +{% raw %} + +```yaml +sensor: +# Example configuration.yaml entry + - platform: scrape + resource: https://www.home-assistant.io + name: Release + select: ".current-version h1" + value_template: '{{ value.split(":")[1] }}' +``` + +{% endraw %} ### Available implementations Get the counter for all our implementations from the [Component overview](/integrations/) page. -| Field | Value | -| --- | --- | -| **Resource** | https://www.home-assistant.io/integrations/ | -| **Name** | Home Assistant impl. | -| **Select** | `a[href="#all"]` | -| **Value Template** | {% raw %}`{{ value.split('(')[1].split(')')[0] }}`{% endraw %} | +{% raw %} + +```yaml +# Example configuration.yaml entry +sensor: + - platform: scrape + resource: https://www.home-assistant.io/integrations/ + name: Home Assistant impl. + select: 'a[href="#all"]' + value_template: '{{ value.split("(")[1].split(")")[0] }}' +``` + +{% endraw %} ### Get a value out of a tag The German [Federal Office for Radiation protection (Bundesamt für Strahlenschutz)](http://www.bfs.de/) is publishing various details about optical radiation including an UV index. This example is getting the index for a region in Germany. -| Field | Value | -| --- | --- | -| **Resource** | http://www.bfs.de/DE/themen/opt/uv/uv-index/prognose/prognose_node.html | -| **Name** | Coast Ostsee | -| **Select** | `p` | -| **Index** | `19` | -| **Unit of Measurement** | `UV Index` | +```yaml +# Example configuration.yaml entry +sensor: + - platform: scrape + resource: http://www.bfs.de/DE/themen/opt/uv/uv-index/prognose/prognose_node.html + name: Coast Ostsee + select: "p" + index: 19 + unit_of_measurement: "UV Index" +``` ### IFTTT status If you make heavy use of the [IFTTT](/integrations/ifttt/) web service for your automations and are curious about the [status of IFTTT](https://status.ifttt.com/) then you can display the current state of IFTTT in your frontend. -| Field | Value | -| --- | --- | -| **Resource** | https://status.ifttt.com/ | -| **Name** | IFTTT status | -| **Select** | `.component-status` | +```yaml +# Example configuration.yaml entry +sensor: + - platform: scrape + resource: https://status.ifttt.com/ + name: IFTTT status + select: ".component-status" +``` ### Get the latest podcast episode file URL If you want to get the file URL for the latest episode of your [favorite podcast](https://hasspodcast.io/), so you can pass it on to a compatible media player. -| Field | Value | -| --- | --- | -| **Resource** | https://hasspodcast.io/feed/podcast | -| **Name** | Home Assistant Podcast | -| **Select** | `enclosure` | -| **Index** | `1` | -| **Attribute** | `url` | +```yaml +# Example configuration.yaml entry +sensor: + - platform: scrape + resource: https://hasspodcast.io/feed/podcast + name: Home Assistant Podcast + select: "enclosure" + index: 1 + attribute: url +``` ### Energy price This example tries to retrieve the price for electricity. -| Field | Value | -| --- | --- | -| **Resource** | https://elen.nu/timpriser-pa-el-for-elomrade-se3-stockholm/ | -| **Name** | Electricity price | -| **Select** | `.text-lg:is(span)` | -| **Index** | `1` | -| **Value Template** | {% raw %}`{{ value \| replace(',', '.') \| float }}`{% endraw %} | -| **Unit of Measurement** | `öre/kWh` | +{% raw %} + +```yaml +# Example configuration.yaml entry +sensor: + - platform: scrape + resource: https://elen.nu/timpriser-pa-el-for-elomrade-se3-stockholm/ + name: Electricity price + select: ".text-lg:is(span)" + index: 1 + value_template: '{{ value | replace (",", ".") | float }}' + unit_of_measurement: "öre/kWh" +``` + +{% endraw %}