I've been analyzing data that I manually collect daily from a feature layer in an ArcGIS map (linked below). I want to automize this process and have been looking for ways to use a RESTful API (or something else) to collect this information.
The task is to save this table (screenshot below) as a python dataframe that I can operate on.
I tried using combinations of the GET statement, and combinations of id keys, but I am unfamiliar with APIs and web-scraping.
Is this task feasible? Is it fairly simple to implement? Where would be the starting steps for someone intermediate in Python, but unfamiliar with web-scraping?
Thanks!
link: http://erieny.maps.arcgis.com/apps/opsdashboard/index.html#/dd7f1c0c352e4192ab162a1dfadc58e1
screenshot of website with desired information in yellow square
This website is almost entirely made with javascript. That being said, it's possible to get the information you want as it's using HTTP requests to generate data from an API. Locating the API and making the specific request it needs you can gain the information from that.
To do this, we need to use the chrome tools network tab. Then do a search for something we know should be in the data. I tried '14001' as I knew that had to be within the data.
So you can see here that we've searched for the correct data. Scrolling down the XHR part of the network tools, you can see the request URL and all the parameters.
Now to make this easier on yourself, you should copy the request as a CURL(BASH) seen here. You can copy this into curl.trillworks.com, this will convert that request into python with the requests library.
So that being said it's quite easy now with the headers and correct parameters to get the correct data.
import requests
import pandas as pd
headers = {
'Referer': 'http://erieny.maps.arcgis.com/apps/opsdashboard/index.html',
'User-Agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Mobile Safari/537.36',
}
params = (
('f', 'json'),
('where', '1=1'),
('returnGeometry', 'false'),
('spatialRel', 'esriSpatialRelIntersects'),
('outFields', '*'),
('orderByFields', 'ZIP_CODE asc'),
('resultOffset', '0'),
('resultRecordCount', '80'),
('resultType', 'standard'),
('cacheHint', 'true'),
)
response = requests.get('https://services1.arcgis.com/CgOSc11uky3egK6O/arcgis/rest/services/erie_zip_codes_confirmed_counts/FeatureServer/0/query', headers=headers, params=params)
data = response.json()['features']
lists = []
for a in data:
zipcode = a['attributes']['ZIP_CODE']
confirmed =a['attributes']['CONFIRMED']
lists.append((zipcode,confirmed))
df = pd.DataFrame(lists,columns=['Zip Code','Confirmed Cases'])
[('14001', 39),
('14004', 70),
('14006', 30),
('14013', 0),
('14025', 11),
('14026', 4),
('14030', 2),
('14031', 84),
('14032', 48),
('14033', 3),
('14034', 1),....]
Zip Code Confirmed Cases
0 14001 39
1 14004 70
2 14006 30
3 14013 0
4 14025 11
... ... ...
61 14225 257
62 14226 187
63 14227 260
64 14228 128
65 14260 0
We are import the requests library, which handles HTTP requests easily.
The requests.get()
method processes an URL we give it and gives us back the response. In this case the response is in JSON object format. In the arguments we can specify the headers and parameters we want to make the request.
So we're using the correct params and headers to make the request, it turns out it's absolutely necessary to give the headers as well as the params. You can test this out and indeed I often just make a simple GET HTTP request without any data to see if it's easy to mimic. In this case you need both params and headers.
The response.json()
method converts the JSON object into a python dictionary.
Now it takes abit of time to get the information you want, so I encourage you to play about with this.
It turns out the desired information is within response.json()['features']
. Within that is a list of dictionaries. So we have to loop over this. So a refers to each list item which happens to be one dictionary. We then go for the specific keys that get us to the value. In this case, within the attributes key and then postcode key we can get the postcodes and the same within the attributes key there is the confirmed key and we can access the value for confirmed. Again I strongly urge to you play about with the json object converted dictionary to get a feel for this.
Here I'm appending the variables zipcode and confirmed into a tuple into a list. You could then use this in pandas as shown above.