Search code examples
pythongisesri

Accessing text data in web-hosted GIS Map (ESRI) via python


I would like to interact with a web-hosted GIS map application here to scrape data contained therein. The data is behind a toggle button.

Normally, creating a soup of the websites text via BeautifulSoup and requests.get() suffices to where the text data is parse-able, however this method returns some sort of esri script, and none of the desired html or text data.

Snapshot of the website with desired element inspected: enter image description here

Snapshot of the button toggled, showing the text data I'd like to scrape: enter image description here

The code's first mis(steps):

import requests
from bs4 import BeautifulSoup

site = 'https://dwrapps.utah.gov/fishing/fStart'

soup = BeautifulSoup(requests.get(site).text.lower(), 'html.parser')

The return of said soup is too lengthy to post here, but there is no way to access the html data behind the toggle shown above.

I assume use of selenium would do the trick, but was curious if there was an easier method of interacting directly with the application.


Solution

  • the site is get json from https://dwrapps.utah.gov/fishing/GetRegionReports (in the js function getForecastData)

    so you can use it in requests:

    from json import dump
    from typing import List
    import requests
    url = "https://dwrapps.utah.gov/fishing/GetRegionReports"
    json:List[dict] = requests.get(url).json()
    
    with open("gis-output.json","w") as io:
        dump(json,io,ensure_ascii=False,indent=4) # export full json from to the filename gis-output.json
    
    for dt in json:
        reportData = dt.get("reportData",'') # the full text 
        displayName = dt.get("displayName",'')
        # do somthing with the data.
        """
        you can acsses also this fields:
        regionAdm = dt.get("regionAdm",'')
        updateDate = dt.get("updateDate",'')
        dwrRating = dt.get("dwrRating",'')
        ageDays = dt.get("ageDays",'')
        publicCount = dt.get("publicCount",'')
        finalRating = dt.get("finalRating",'')
        lat = dt.get("lat",'')
        lng = dt.get("lng",'')
        """