I'm trying to click on a map using selenium so that I can scrape parcel id
and owner name
from box-like containers. When a click is made on that map, box-like container shows up. I would like to scrape parcel id
and owner name
from such container. This is how a box-like container looks like. I tried using requests but could not find any way to locate the information available in such containers, so I'm trying now using selenium. The script below neither clicks on that map, nor throws any error.
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "http://app01.cityofboston.gov/parcelviewer/"
driver = webdriver.Chrome()
driver.get(link)
wait = WebDriverWait(driver, 20)
for item in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "svg#mapDiv_gc"))):
item.click()
driver.quit()
How can I grab the parcel Ids and the owner names from different box-like containers out of that map?
This is data coming from ArcGIS REST Service.
I've located this Argis query call that returns the wanted data :
GET https://services.arcgis.com/sFnw0xNflSi8J0uh/arcgis/rest/services/Parcels19WMFull/FeatureServer/0/query
I've checked out what could be generating this url and found the following :
values.webmap
: 2da765769ee34446a396a9c9010f5631
This query call is called when you search for data in the input on the top left corner. You can edit the url parameters to match all data :
{
"f": "json",
"where": "1=1",
"returnGeometry": "true",
"spatialRel": "esriSpatialRelIntersects",
"outFields": "*",
"outSR": "102100"
}
It returns a maximum of 2000 items, so we'll need to iterate. To known how to iterate, we can checkout the content in the features
array, checkout this queryit gives something like that :
{
"attributes": {
"FID": 1,
"FULL_ADDRE": "104 A 104 PUTNAM ST, 02128",
"PID": "0100001000"
}
},
{
"attributes": {
"FID": 2,
"FULL_ADDRE": "18 LEVERETT AV #10-B, 02128",
"PID": "0101399120"
}
},
{
"attributes": {
"FID": 3,
"FULL_ADDRE": "197 LEXINGTON ST, 02128",
"PID": "0100002000"
}
}
....
So we can iterate over the FID
field using where=FID > 2000
and for the next iteration we can just store the last FID we get and edit the where clause with FID > {last_fid}
So you can build a python script like this :
import requests
base_url = "http://app01.cityofboston.gov/parcelviewer"
# get map id
r = requests.get(f"{base_url}/config/ParcelViewer.json")
map_id = r.json()["values"]["webmap"]
# get the query url
r = requests.get(f"https://www.arcgis.com/sharing/rest/content/items/{map_id}/data", params = {
"f": "json"
})
url = r.json()["operationalLayers"][0]["url"]
params = {
"f": "json",
"where": "1=1",
"returnGeometry": "true",
"spatialRel": "esriSpatialRelIntersects",
"outFields": "*",
"outSR": "102100"
}
data = []
count = 1
finish = False
while finish == False:
print(f"[{count}] requesting...")
r = requests.get(f"{url}/query", params = params)
entries = r.json()["features"]
if len(entries) < 2000:
finish = True
else:
last_fid = entries[-1]["attributes"]["FID"]
print(f"next fid : {last_fid}")
params["where"] = f"FID > {last_fid}"
data.extend(entries)
print(f"[{count}] received {len(entries)} items - total received : {len(data)}")
count +=1
print(f"TOTAL: {len(data)}")
# print the last element (just to check)
print(data[-1])
After several minutes, the script has extracted 171922 records :
This is what an entry looks like :
{
'attributes': {
'FID': 171922,
'PID_LONG': '2205670000',
'PID': '2205670000',
'GIS_ID': '2205670000',
'FULL_ADDRE': '2203 COMMONWEALTH AV, 02135',
'OWNER': 'COMMWLTH OF MASS',
'LAND_USE': 'E',
'LAND_SF': 34125,
'LIVING_ARE': 7386,
'AV_LAND': 1325400,
'AV_BLDG': 841100,
'AV_TOTAL': 2166500,
'GROSS_TAX': 0,
'ID': 0,
'SHAPE_Leng': 1003.12908156,
'SHAPE_Area': 33512.6220608,
'Shape__Area': 5702.6640625,
'Shape__Length': 414.046143349521
},
'geometry': {
'rings': [
[
[-7922244.91043368, 5212145.61745703],
[-7922247.98527419, 5212105.5446644],
[-7922243.75007186, 5212106.29247827],
[-7922235.83595224, 5212062.80771992],
[-7922239.05526106, 5212062.68000813],
[-7922327.54387782, 5212214.66112252],
[-7922281.74795739, 5212208.62518937],
[-7922266.82960043, 5212207.97287607],
[-7922241.02937963, 5212204.61661323],
[-7922244.0269726, 5212158.45234151],
[-7922244.91043368, 5212145.61745703]
]
]
}
}
One last thing, just to check the result count directly on the API, we could use the query parameter from the Arcgis query UI like this one (which is the map used in the website by the way). When filtered by count only, it adds the field returnCountOnly=true
, lets do that in our query endpoint :
which returns correctly :
{"count":171922}
Note that you can apply some variant of this script for any Arcgis Rest service query type. I've made an example on this gist to get the data from the map (cities). Note that the max result returned by the API may change depending on the service