Search code examples
pythonweb-scrapingpython-requestsscreen-scraping

Scraping data from site with more button and JSON file doesn't load


So I am trying to scrape all the available jobs from the following site: https://apply.workable.com/fitxr/ The issue is that the site uses javascript and has a load more button.

I went to the chrome network settings and found the json file that the site uses chrome network

however when I go to the site https://apply.workable.com/api/v3/accounts/fitxr/jobs I get an not found error site error

Not sure how to get the data.

here is the code I wrote to try and scrape the data via xpath.

    data = []
    headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
    }
    url = "https://apply.workable.com/fitxr/"
    page = requests.get(url, headers=headers)
    tree = html.fromstring(page.content)
    xpath = '/html/body/div/div/div/main/div[2]/ul/li[*]/div/h3'
    jobs = tree.xpath(xpath)
    for job in jobs:
        print(job.text)

and here using the JSON site

    data = []
    url = "https://apply.workable.com/api/v3/accounts/fitxr/jobs"
    r = requests.get(url)
    json = r.json()
    for x in range(len(json["results"])):
        print(json["results"][x]["title"])

both sets of code return nothing


Solution

  • The request you found in your browser's development tools is a POST request to the /jobs endpoint; your attempt used requests.get (which sends a GET request to the same endpoint). /jobs does not respond to GET reqests, apparently.

    Change your call to requests.get() to requests.post(), instead:

    import requests 
    
    data = []
    url = "https://apply.workable.com/api/v3/accounts/fitxr/jobs"
    r = requests.post(url)
    json = r.json()
    for x in range(len(json["results"])):
        print(json["results"][x]["title"])
    

    Repl.it

    Results:

    Engineering Manager - Services & Full Stack
    Interim Talent Partner
    Customer Experience Manager
    Content Manager (Production)
    Performance Marketing Manager
    Performance Marketing Manager
    Content Creator (Fitness and Music)
    Content Creator (Fitness and Music)
    Automation Tester
    Engineering Manager - Security, Data and DevOps