Search code examples
pythonjsonbrowserdata-sciencedata-extraction

fetching data from chrome browser's network panel


I am working on data scraping from a website. I download the data.json file from the Network panel of the browser's inspect element. Then read the JSON file locally to store the results. My problem is that I want to make this script fetch automatically this data.json file every couple of hours and record the information.


Solution

  • Don't try to get anything out of Chrome -- that's unnecessary.

    The SPA there is making a call to a metadata url to get the current "directory" (datetime) and then using that directory to lookup the latest interval_generation_data.

    This will get you the data every minute. Notice there's no error handling in here so your loop will end the first time you get a 403 or similar.

    import requests
    import json
    import time
    
    s = requests.Session()
    s.headers = {
        'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.125 Safari/537.36',
        'referer': 'https://outagemap.coned.com/external/default.html',
        'x-requested-with': 'XMLHttpRequest'
    }
    
    metadata_url = 'https://outagemap.coned.com/resources/data/external/interval_generation_data/metadata.json'
    json_url = "https://outagemap.coned.com/resources/data/external/interval_generation_data/"
    
    while True:
        r = s.get(metadata_url, params={'_': int(time.time())})
        directory = r.json()['directory']
    
        r = s.get(json_url + f'{directory}/data.json', params={'_': int(time.time())})
        print(r.json())
        time.sleep(60)