So I am trying to scrape all the available jobs from the following site: https://apply.workable.com/fitxr/
The issue is that the site uses javascript and has a load more button.
I went to the chrome network settings and found the json file that the site uses
however when I go to the site https://apply.workable.com/api/v3/accounts/fitxr/jobs
I get an not found error
Not sure how to get the data.
here is the code I wrote to try and scrape the data via xpath.
data = []
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:90.0) Gecko/20100101 Firefox/90.0"
}
url = "https://apply.workable.com/fitxr/"
page = requests.get(url, headers=headers)
tree = html.fromstring(page.content)
xpath = '/html/body/div/div/div/main/div[2]/ul/li[*]/div/h3'
jobs = tree.xpath(xpath)
for job in jobs:
print(job.text)
and here using the JSON site
data = []
url = "https://apply.workable.com/api/v3/accounts/fitxr/jobs"
r = requests.get(url)
json = r.json()
for x in range(len(json["results"])):
print(json["results"][x]["title"])
both sets of code return nothing
The request you found in your browser's development tools is a POST
request to the /jobs
endpoint; your attempt used requests.get
(which sends a GET
request to the same endpoint). /jobs
does not respond to GET
reqests, apparently.
Change your call to requests.get()
to requests.post()
, instead:
import requests
data = []
url = "https://apply.workable.com/api/v3/accounts/fitxr/jobs"
r = requests.post(url)
json = r.json()
for x in range(len(json["results"])):
print(json["results"][x]["title"])
Results:
Engineering Manager - Services & Full Stack
Interim Talent Partner
Customer Experience Manager
Content Manager (Production)
Performance Marketing Manager
Performance Marketing Manager
Content Creator (Fitness and Music)
Content Creator (Fitness and Music)
Automation Tester
Engineering Manager - Security, Data and DevOps