I'm trying to scrape this site: https://www.dysportusa.com/find-a-specialist
There's an API using a GET Request:
I have a list of coordinates, with the corresponding zipcodes to try to make calls to the API, but keep getting error:
{'message': 'Validation Error', 'status': 403, 'data': {'message': 'You have no permission to access this API. Contact admin'}}
When I use the website to make a call and then click on the URL of the API, sometimes I get an error as well, despite the website having loaded the data.
Is there any way around this?
Code below:
import requests
from datetime import date
clinic_data = {}
column_names = ["Name", "Address", "City", "State", "Zipcode", "Email", "Phone Number", "Specialties"]
coordinates = [
(40.750742, -73.99653), # 10001 (New York, NY)
(33.973951, -118.248405), # 90001 (Los Angeles, CA)
(41.88531, -87.621238), # 60601 (Chicago, IL)
(29.775186, -95.31022), # 77001 (Houston, TX)
(25.787676, -80.224145) # 33101 (Miami, FL)
]
zipcodes = [
"10001", # New York, NY
"90001", # Los Angeles, CA
"60601", # Chicago, IL
"77001", # Houston, TX
"33101" # Miami, FL
]
for i in range(len(coordinates)):
latitude = coordinates[i][0]
longitude = coordinates[i][1]
zipcode = zipcodes[i]
api_url = "https://www.dysportusa.com/api/find-a-specialist?latitude={lat}&longitude={long}&take=100&radius=150&zipToSortBy={zip}".format(lat=latitude, long=longitude, zip=zipcode)
try:
r = requests.get(api_url).json()
print(r)
except:
continue
The website is using some rudimentary means of throttling / request validation / authorization based on cookie values and HTTP Headers. Part of why your requests look suspicious is because you aren't even providing your own HTTP Headers. That means the website can easily see that the requests are coming from Python's request module. Adding this small information should solve the issue.
import requests
from datetime import date, datetime
clinic_data = {}
column_names = ["Name", "Address", "City", "State", "Zipcode", "Email", "Phone Number", "Specialties"]
coordinates = [
(40.750742, -73.99653), # 10001 (New York, NY)
(33.973951, -118.248405), # 90001 (Los Angeles, CA)
(41.88531, -87.621238), # 60601 (Chicago, IL)
(29.775186, -95.31022), # 77001 (Houston, TX)
(25.787676, -80.224145) # 33101 (Miami, FL)
]
zipcodes = [
"10001", # New York, NY
"90001", # Los Angeles, CA
"60601", # Chicago, IL
"77001", # Houston, TX
"33101" # Miami, FL
]
print("Running")
for i in range(len(coordinates)):
latitude = coordinates[i][0]
longitude = coordinates[i][1]
zipcode = zipcodes[i]
datestamp = datetime.utcnow().strftime('%a+%b+%d+%Y+%H%%3A%M%%3A%S+GMT%%2B0000+(Coordinated+Universal+Time)')
optanon_alert_box_closed = datetime.utcnow().isoformat() + 'Z'
api_url = "https://www.dysportusa.com/api/find-a-specialist"
params = {
'latitude': latitude,
'longitude': longitude,
'take': '100',
'radius': '150',
'zipToSortBy': zipcode
}
headers = {
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:127.0) Gecko/20100101 Firefox/127.0',
'Accept': 'application/json, text/javascript, */*; q=0.01',
'Accept-Language': 'en-US,en;q=0.5',
'Accept-Encoding': 'gzip, deflate, br, zstd',
'X-Requested-With': 'XMLHttpRequest',
'DNT': '1',
'Connection': 'keep-alive',
'Referer': 'https://www.dysportusa.com/find-a-specialist',
'Cookie': f'OptanonConsent=isGpcEnabled=0&datestamp={datestamp}&version=202309.1.0&browserGpcFlag=0&isIABGlobal=false&hosts=&landingPath=NotLandingPage&groups=C0001%3A1%2CC0002%3A0%2CC0004%3A0&geolocation=%3B&AwaitingReconsent=false; OptanonAlertBoxClosed={optanon_alert_box_closed}',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-origin',
'TE': 'trailers'
}
try:
r = requests.get(api_url, headers=headers, params=params).json()
print(r)
except:
continue
If the website admins want to try to keep you from scraping their service then they may add additional barriers.