Search code examples
javascriptpythonweb-scrapingpython-requests-html

How to scrape text from tooltips generated with javascript


I've written the following code to get the positions of all blue markers in the map.

from bs4 import BeautifulSoup
from requests_html import HTMLSession
session = HTMLSession()

url="https://emf2.bundesnetzagentur.de/karte/Default.aspx?lat=52.4107723&lon=14.2930953&zoom=14"
r = session.get(url)
r.html.render(sleep = 3)
data = r.html.html

soup=BeautifulSoup(data,'html.parser')
BlueTriangles = soup.find_all(src="images/funk_hf.png")
for Triangle in BlueTriangles[1:]:
    TriangleStyle = Triangle['style']
    PixelPosition = TriangleStyle.split('transform: translate3d(')[1].split(', 0px); z')[0]
    print(PixelPosition)

r.session.close()

When I open the URL using a web browser, I see that each blue marker has a unique ID that is shown in a tooltip on mouseover:

enter image description here

The html code of the tooltip appears to be rendered triggered by a mouseover event:

enter image description here

Is there any way of scraping the the ID from the tooltip? I was wondering whether it is possible use the script parameter of render to force a mouseover event. But I couldn't find a way to integrate it in the code:

$('#foo').trigger('mouseover');

Solution

  • Points on the map are rendered by request to the endpoint https://emf2.bundesnetzagentur.de/karte/Standortservice.asmx/GetStandorteFreigabe with box coordinates (in this case {"Box":{"sued":52.39231101879802,"west":14.248666763305664,"nord":52.42927461241364,"ost":14.337587356567385}}).

    Response is json. Locations' data is encrypted by AES. Decryption code is available in js script loading with page (functions CryptParams and DecryptData).

    After decryption we get this nice data: "[{"Titel":"018126","Lng":14.311666,"Lat":52.428888,"fID":1076,"sonderseite":false},{"Titel":"011720","Lng":14.259722,"Lat":52.423054,"fID":2196,"sonderseite":false},{"Titel":"87011082","Lng":14.275832,"Lat":52.401666,"fID":560919,"sonderseite":false}]"

    You have two ways.

    1. Use selenium or similar software to render JS and try to parse resulting DOM;

    2. Write parser to send request to GetStandorteFreigabe endpoint and decode it's response (convert code from js to python),