php web-scraping web-crawler data-extraction

Scrape and Extract data from https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/ when it is not visible in the 'Source Code' of the webpage

I am trying to write an automated PHP script to scrape and extract all 'Job Titles' (Primary Care Physician - Tidewater Market, Primary Care Physician - Richmond Market etc.) from URL https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/

However, this does not seem to be straightforward because the required data is not directly visible in the source code of the webpage. I also tried inspecting 'Developer Tools->Network' of different browsers, however could not locate the source of the data.

Any help would be highly appreciated.

Thanks & Regards!

Solution

Looking at the requests made by the website one notices an XHR request that contains the data you care about:

However visiting that URL in a browser gives the same result as navigating to https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/. Investigating further by looking at the request headers

one notices the Accept:application/json,application/xml (which signifies that the client expect a json or xml document). Indeed it turns out to be true that requesting https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/ with this additional header returns the desired data:

>>> import urllib.request
>>> req = urllib.request.Request('https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/')
>>> req.add_header('Accept', 'application/json,application/xml')
>>> urllib.request.urlopen(req).read().decode('utf-8').find('Primary Care Physician ') > 0
True

Therefore in PHP you probably want to do the following steps:

Request ttps://chenmed.wd1.myworkdayjobs.com/en-US/jencare/ with the additional header Accept:application/json,application/xml (see e.g. How do I send a GET request with a header from PHP?)
Parse the returned JSON (e.g. using http://php.net/manual/de/function.json-decode.php)