Search code examples
pythonweb-scrapingbeautifulsouplinkedin-apihttp-status-code-403

Difficulties when web scraping job offers on LinkedIn


I have been trying to scrape the job offer section for LinkedIn for a while but to no avail. By the way, I know the site has its own API but I want to do it with Beautiful Soup since I learned a while ago and it is for practising purposes.

Here is my code:

import requests
from bs4 import BeautifulSoup

client = requests.Session()

HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/login/en'
URL = 'https://www.linkedin.com/jobs/search/?geoId=101174742&keywords=data%20analyst&location=Canada'

html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")

login_information = {
    'session_key':'<username>',
    'session_password':'<password>',
    'loginCsrfParam': '<csrftoken>',
}
try:
    p = client.post(LOGIN_URL, data=login_information)
    print ("Login Successful")
except:
    print ("Failed to Login")

All good until here. I get "Login Succesful" but then when I ask for the "status code" I get 403:

p.status_code
Output: 403

And of course I can't scrape any info. How can I do it in the proper way?


Solution

  • You don't really have to re-invent the wheel. There's a module called, surprise, surprise, linkedin-api for accessing all sorts of LinkedIn data (including jobs) via so-called Voyager service.

    Example usage:

    from linkedin_api import Linkedin
    
    # Authenticate using any Linkedin account credentials
    api = Linkedin('[email protected]', '*******')
    
    # GET a profile
    profile = api.get_profile('billy-g')
    
    # GET a profiles contact info
    contact_info = api.get_profile_contact_info('billy-g')
    
    # GET 1st degree connections of a given profile
    connections = api.get_profile_connections('1234asc12304')
    

    I'm sharing this because you might have a really hard time scraping LinkedIn with good old BeautifulSoup and requests. Also, a note of caution, do not use your personal account for any scraping activities on LinkedIn.