I have been trying to scrape the job offer section for LinkedIn for a while but to no avail. By the way, I know the site has its own API but I want to do it with Beautiful Soup since I learned a while ago and it is for practising purposes.
Here is my code:
import requests
from bs4 import BeautifulSoup
client = requests.Session()
HOMEPAGE_URL = 'https://www.linkedin.com'
LOGIN_URL = 'https://www.linkedin.com/login/en'
URL = 'https://www.linkedin.com/jobs/search/?geoId=101174742&keywords=data%20analyst&location=Canada'
html = client.get(HOMEPAGE_URL).content
soup = BeautifulSoup(html, "html.parser")
login_information = {
'session_key':'<username>',
'session_password':'<password>',
'loginCsrfParam': '<csrftoken>',
}
try:
p = client.post(LOGIN_URL, data=login_information)
print ("Login Successful")
except:
print ("Failed to Login")
All good until here. I get "Login Succesful" but then when I ask for the "status code" I get 403:
p.status_code
Output: 403
And of course I can't scrape any info. How can I do it in the proper way?
You don't really have to re-invent the wheel. There's a module called, surprise, surprise, linkedin-api for accessing all sorts of LinkedIn data (including jobs) via so-called Voyager
service.
Example usage:
from linkedin_api import Linkedin
# Authenticate using any Linkedin account credentials
api = Linkedin('[email protected]', '*******')
# GET a profile
profile = api.get_profile('billy-g')
# GET a profiles contact info
contact_info = api.get_profile_contact_info('billy-g')
# GET 1st degree connections of a given profile
connections = api.get_profile_connections('1234asc12304')
I'm sharing this because you might have a really hard time scraping LinkedIn with good old BeautifulSoup
and requests
. Also, a note of caution, do not use your personal account for any scraping activities on LinkedIn.