Search code examples
pythonseleniumweb-scrapinglxmlpyppeteer

How to scrapr active data generated by js on a map


I'm new python user and I want to scrape data from this website: https://www.telerad.be/Html5Viewer/index.html?viewer=telerad_fr

My problem is that the data are dynamically generated. I read few possibilities to fix but none is satisfying. With selenium I need a name or Xpath to click on button but here there is nothing.

import requests
from lxml import html

page = requests.get('https://www.telerad.be/Html5Viewer/index.html?viewer=telerad_fr')
tree = html.fromstring(page.content)

cities = tree.xpath('//*[@id="map-container"]/div[6]/div[2]/div/div[2]/div/div/div[1]/div/p[1]/text()[2]')


print('Cities: ', cities)

Solution

  • There actually IS an xpath to click on the buttons:

    //*[@id='0_layer']/*[@fill]
    

    Here, try this (selenium):

    dotList = driver.find_elements_by_xpath("//*[@id='0_layer']/*[@fill]")
    for dot in dotList:
        dot.click()
        cities = driver.find_element_by_xpath("//div[@data-region-name='NavigationMapRegion']//p[1]")
        print("Cities: ", cities.text)
        closeBtn = driver.find_element_by_xpath("//*[@class='panel-header-button right close-16']")
        closeBtn.click(); #the modal can intercept clicks on some dots, thats why we close it here after extracting the info we need.
    

    this code clicks (or at least tries to, if no StaleElementExceptions occur) all the orange dots on the map, and print the "Cities" content (based on your Xpath).

    If anyone finds an error in the code, please edit this answer, i wrote this on notepad++.