Search code examples
phpweb-scrapingweb-crawlerdata-extraction

Scrape and Extract data from https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/ when it is not visible in the 'Source Code' of the webpage


I am trying to write an automated PHP script to scrape and extract all 'Job Titles' (Primary Care Physician - Tidewater Market, Primary Care Physician - Richmond Market etc.) from URL https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/

However, this does not seem to be straightforward because the required data is not directly visible in the source code of the webpage. I also tried inspecting 'Developer Tools->Network' of different browsers, however could not locate the source of the data.

Any help would be highly appreciated.

Thanks & Regards!


Solution

  • Looking at the requests made by the website one notices an XHR request that contains the data you care about:

    enter image description here

    However visiting that URL in a browser gives the same result as navigating to https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/. Investigating further by looking at the request headers

    enter image description here

    one notices the Accept:application/json,application/xml (which signifies that the client expect a json or xml document). Indeed it turns out to be true that requesting https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/ with this additional header returns the desired data:

    >>> import urllib.request
    >>> req = urllib.request.Request('https://chenmed.wd1.myworkdayjobs.com/en-US/jencare/')
    >>> req.add_header('Accept', 'application/json,application/xml')
    >>> urllib.request.urlopen(req).read().decode('utf-8').find('Primary Care Physician ') > 0
    True
    

    Therefore in PHP you probably want to do the following steps:

    1. Request ttps://chenmed.wd1.myworkdayjobs.com/en-US/jencare/ with the additional header Accept:application/json,application/xml (see e.g. How do I send a GET request with a header from PHP?)
    2. Parse the returned JSON (e.g. using http://php.net/manual/de/function.json-decode.php)