Search code examples
pythonweb-scrapingpython-requests

Webscraping with API using GET Request, "Validation Error"


I'm trying to scrape this site: https://www.dysportusa.com/find-a-specialist

There's an API using a GET Request:

https://www.dysportusa.com/api/find-a-specialist?latitude=32.79742543951647&longitude=-117.24333717347025&take=100&radius=150&zipToSortBy=92109

I have a list of coordinates, with the corresponding zipcodes to try to make calls to the API, but keep getting error:

{'message': 'Validation Error', 'status': 403, 'data': {'message': 'You have no permission to access this API. Contact admin'}}

When I use the website to make a call and then click on the URL of the API, sometimes I get an error as well, despite the website having loaded the data.

Is there any way around this?

Code below:

import requests
from datetime import date


clinic_data = {}
column_names = ["Name", "Address", "City", "State", "Zipcode", "Email", "Phone Number", "Specialties"]



coordinates = [
    (40.750742, -73.99653),    # 10001 (New York, NY)
    (33.973951, -118.248405),  # 90001 (Los Angeles, CA)
    (41.88531, -87.621238),    # 60601 (Chicago, IL)
    (29.775186, -95.31022),    # 77001 (Houston, TX)
    (25.787676, -80.224145)   # 33101 (Miami, FL)
]

zipcodes = [
    "10001",  # New York, NY
    "90001",  # Los Angeles, CA
    "60601",  # Chicago, IL
    "77001",  # Houston, TX
    "33101"  # Miami, FL
]

for i in range(len(coordinates)):
    latitude = coordinates[i][0]
    longitude = coordinates[i][1]
    zipcode = zipcodes[i]

    api_url = "https://www.dysportusa.com/api/find-a-specialist?latitude={lat}&longitude={long}&take=100&radius=150&zipToSortBy={zip}".format(lat=latitude, long=longitude, zip=zipcode)
    try:
        r = requests.get(api_url).json()
        print(r)

    except:
        continue

Solution

  • The website is using some rudimentary means of throttling / request validation / authorization based on cookie values and HTTP Headers. Part of why your requests look suspicious is because you aren't even providing your own HTTP Headers. That means the website can easily see that the requests are coming from Python's request module. Adding this small information should solve the issue.

    import requests
    from datetime import date, datetime
    
    clinic_data = {}
    column_names = ["Name", "Address", "City", "State", "Zipcode", "Email", "Phone Number", "Specialties"]
    
    coordinates = [
        (40.750742, -73.99653),    # 10001 (New York, NY)
        (33.973951, -118.248405),  # 90001 (Los Angeles, CA)
        (41.88531, -87.621238),    # 60601 (Chicago, IL)
        (29.775186, -95.31022),    # 77001 (Houston, TX)
        (25.787676, -80.224145)   # 33101 (Miami, FL)
    ]
    
    zipcodes = [
        "10001",  # New York, NY
        "90001",  # Los Angeles, CA
        "60601",  # Chicago, IL
        "77001",  # Houston, TX
        "33101"  # Miami, FL
    ]
    
    print("Running")
    
    for i in range(len(coordinates)):
        latitude = coordinates[i][0]
        longitude = coordinates[i][1]
        zipcode = zipcodes[i]
    
        datestamp = datetime.utcnow().strftime('%a+%b+%d+%Y+%H%%3A%M%%3A%S+GMT%%2B0000+(Coordinated+Universal+Time)')
        optanon_alert_box_closed = datetime.utcnow().isoformat() + 'Z'
    
        api_url = "https://www.dysportusa.com/api/find-a-specialist"
        params = {
        'latitude': latitude,
        'longitude': longitude,
        'take': '100',
        'radius': '150',
        'zipToSortBy': zipcode
        }
        headers = {
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:127.0) Gecko/20100101 Firefox/127.0',
            'Accept': 'application/json, text/javascript, */*; q=0.01',
            'Accept-Language': 'en-US,en;q=0.5',
            'Accept-Encoding': 'gzip, deflate, br, zstd',
            'X-Requested-With': 'XMLHttpRequest',
            'DNT': '1',
            'Connection': 'keep-alive',
            'Referer': 'https://www.dysportusa.com/find-a-specialist',
            'Cookie': f'OptanonConsent=isGpcEnabled=0&datestamp={datestamp}&version=202309.1.0&browserGpcFlag=0&isIABGlobal=false&hosts=&landingPath=NotLandingPage&groups=C0001%3A1%2CC0002%3A0%2CC0004%3A0&geolocation=%3B&AwaitingReconsent=false; OptanonAlertBoxClosed={optanon_alert_box_closed}',
            'Sec-Fetch-Dest': 'empty',
            'Sec-Fetch-Mode': 'cors',
            'Sec-Fetch-Site': 'same-origin',
            'TE': 'trailers'
        }
        try:
            r = requests.get(api_url, headers=headers, params=params).json()
            print(r)
    
        except:
            continue
    

    If the website admins want to try to keep you from scraping their service then they may add additional barriers.