Search code examples
pythonweb-scrapingpython-requests-html

Web scraping using requests-html - How does one collect a simple number from a website?


I am trying to collect a data point from an electricity data website:
electricityMap | Live CO₂ emissions of electricity consumption

So far I have written this code:

from requests_html import HTMLSession              #import libraries

s = HTMLSession()

url = 'https://app.electricitymap.org/zone/DK-DK2'

r = s.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36'})

webpageTitle = (r.html.find('title', first=True).text)
print(webpageTitle)

I am able to get VS Code to print out the title of the website but I am only interested in the amount of renewable energy in the given moment. This is displayed as the "renewable" dial in the top left on the website.

I have inspected the website and found the value I am trying to collect: Screenshot of Chrome DevTools.

What do i need to write to be able to print this value in Python?


Solution

  • As @ Tim Roberts has stated that the web site is built entirely through Javascrip. I tested both requests_html and selenium. requests_html gives empty outupt meaning can't render JavaScript but selenium produce the perfect output.

    from requests_html import HTMLSession    #import libraries
    from bs4 import BeautifulSoup as bs
             
    s = HTMLSession()
    
    url = 'https://app.electricitymap.org/zone/DK-DK2'
    
    r = s.get(url, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.74 Safari/537.36'})
    
    
    soup=bs(r.text,'html.parser')
    renewable=[x.get_text() for x in soup.select('g[class="circular-gauge"] text')]
    print(renewable)
    

    Output:

    []

    #Selenium: You have nothing to install just  you can run the code
    
    
    from bs4 import BeautifulSoup as bs
    import time
    from selenium import webdriver
    from webdriver_manager.chrome import ChromeDriverManager
    
    url = 'https://app.electricitymap.org/zone/DK-DK2'
    
    driver = webdriver.Chrome(ChromeDriverManager().install())
    driver.maximize_window()
    
    driver.get(url)
    time.sleep(2)
    
    
    soup=bs(driver.page_source,'html.parser')
    renewable=[x.get_text() for x in soup.select('g[class="circular-gauge"] text')][1]
    print(renewable)
    

    Output:

    69%