Search code examples
javascriptpythonweb-scrapingscrapyleaflet

How to scrape location data from a leaflet map?


I want to access the location (latitude, longitude) of the water level sensor markers found in this website but I can't find any HTML tags which contains their locations.

Any guidance would be very helpful!


Solution

  • Looking at the network inspector, you can see that the page sends a GET request which looks like an API call: https://app.pub.gov.sg/waterlevel/pages/GetWLInfo.aspx?type=WL&d=2023-08-15T07:00:00.000Z

    By visiting the API endpoint, you can find the current data for all sensors, which you can get with the following:

    import requests
    
    # Fetch data
    endpoint = "https://app.pub.gov.sg/waterlevel/pages/GetWLInfo.aspx"
    params = {"type": "WL"}
    response = requests.get(endpoint, params=params)
    return response.content.decode("utf-8")
    
    # Parse data
    dataSplit = [[data for data in sensor.split("$#$")] for sensor in raw.split("$#$$@$")]
    data = []
    for record in dataSplit:
        # Convert data to dict and typecast
        if record not in data and len(record) == 7:
            data.append(
                {
                    "sensor-id": record[0],
                    "sensor-name": record[1],
                    "latitude": float(record[3]),
                    "longitude": float(record[2]),
                    "water-level": float(record[4]),
                    "status": float(record[5]),
                    "timestamp": parseTimestamp(record[6]),
                }
            )
    
    # Convert to dataframe
    df = pd.DataFrame(data)
    

    To parse timestamps, use the following:

    def parseTimestamp(timestamp: str):
        # Standardise timestamp
        timestamp = timestamp.split(" ")
        while "" in timestamp:
            timestamp.remove("")
        # Pad day
        if len(timestamp[1]) < 2:
            timestamp[1] = "0" + timestamp[1]
        # Pad time
        if len(timestamp[-1].split(":")[0]) < 2:
            timestamp[-1] = "0" + timestamp[-1]
        timestamp = " ".join(timestamp)
        timestamp = datetime.strptime(timestamp, "%b %d %Y  %I:%M%p")
        return timestamp