Search code examples
pythonhtmlbeautifulsoupurllib

Can't find hrefs of interest with BeautifulSoup


I am trying to collect a list of hrefs from the Netflix careers site: https://jobs.netflix.com/search. Each job listing on this site has an anchor and a class: <a class=css-2y5mtm essqqm81>. To be thorough here, the entire anchor is:

<a class="css-2y5mtm essqqm81" role="link" href="/jobs/244837014" aria-label="Manager, Written Communications"\>\
<span tabindex="-1" class="css-1vbg17 essqqm80"\>\<h4 class="css-hl3xbb e1rpdjew0"\>Manager, Written Communications\</h4\>\</span\>\</a\>

Again, the information of interest here is the hrefs of the form href="/jobs/244837014". However, when I perform the standard BS commands to read the HTML:

html_page = urllib.request.urlopen("https://jobs.netflix.com/search")
soup = BeautifulSoup(html_page)

I don't see any of the hrefs that I'm interested in inside of soup.

Running the following loop does not show the hrefs of interest:

for link in soup.findAll('a'):
    print(link.get('href'))

What am I doing wrong?


Solution

  • That information is being fed dynamically in page, via XHR calls. You need to scrape the API endpoint to get jobs info. The following code will give you a dataframe with all jobs currently listed by Netflix:

    import requests
    from bs4 import BeautifulSoup as bs
    import pandas as pd
    from tqdm import tqdm ## if Jupyter: from tqdm.notebook import tqdm
    
    pd.set_option('display.max_columns', None)
    pd.set_option('display.max_colwidth', None)
    
    headers = {
        'referer': 'https://jobs.netflix.com/search',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/105.0.0.0 Safari/537.36'
    }
    big_df = pd.DataFrame()
    s = requests.Session()
    s.headers.update(headers)
    for x in tqdm(range(1, 20)):
        url = f'https://jobs.netflix.com/api/search?page={x}'
        r = s.get(url)
        df = pd.json_normalize(r.json()['records']['postings'])
        big_df = pd.concat([big_df, df], axis=0, ignore_index=True)
    
    print(big_df[['text', 'team', 'external_id', 'updated_at', 'created_at', 'location', 'organization' ]])
    

    Result:

    100%
    19/19 [00:29<00:00, 1.42s/it]
    text    team    external_id updated_at  created_at  location    organization
    0   Events Manager - SEA    [Publicity] 244936062   2022-11-23T07:20:16+00:00   2022-11-23T04:47:29Z    Bangkok, Thailand   [Marketing and PR]
    1   Manager, Written Communications [Publicity] 244837014   2022-11-23T07:20:16+00:00   2022-11-22T17:30:06Z    Los Angeles, California [Marketing and Publicity]
    2   Manager, Creative Marketing - Korea [Marketing] 244740829   2022-11-23T07:20:16+00:00   2022-11-22T07:39:56Z    Seoul, South Korea  [Marketing and PR]
    3   Administrative Assistant - Philippines  [Netflix Technology Services]   244683946   2022-11-23T07:20:16+00:00   2022-11-22T01:26:08Z    Manila, Philippines [Corporate Functions]
    4   Associate, Studio FP&A - APAC   [Finance]   244680097   2022-11-23T07:20:16+00:00   2022-11-22T01:01:17Z    Seoul, South Korea  [Corporate Functions]
    ... ... ... ... ... ... ... ...
    365 Software Engineer (L4/L5) - Content Engineering [Core Engineering, Studio Technologies] 77239837    2022-11-23T07:20:31+00:00   2021-04-22T07:46:29Z    Mexico City, Mexico [Product]
    366 Distributed Systems Engineer (L5) - Data Platform   [Data Platform] 201740355   2022-11-23T07:20:31+00:00   2021-03-12T22:18:57Z    Remote, United States   [Product]
    367 Senior Research Scientist, Computer Graphics / Computer Vision / Machine Learning   [Data Science and Engineering]  227665988   2022-11-23T07:20:31+00:00   2021-02-04T18:54:10Z    Los Gatos, California   [Product]
    368 Counsel, Content - Japan    [Legal and Public Policy]   228338138   2022-11-23T07:20:31+00:00   2020-11-12T03:08:04Z    Tokyo, Japan    [Corporate Functions]
    369 Associate, FP&A [Financial Planning and Analysis]   46317422    2022-11-23T07:20:31+00:00   2017-12-26T19:38:32Z    Los Angeles, California [Corporate Functions]
    370 rows × 7 columns
    

    ​ For each job, the url would be https://jobs.netflix.com/jobs/{external_id}