I am trying to scrape LinkedIn to get the current and education elements(This information is publicly available) from any user profile. However, after running this code I cant get the information i need but instead only empty brackets [].
https://www.linkedin.com/in/bryan-engelhardt-a099204b This is the exact link i am using at this moment and from here i want to be able to scrape the following information: "Current-College of the Holy Cross" and "Education-University of Iowa"
My code is pretty simple:
from lxml import html
import requests
response = requests.get('https://www.linkedin.com/in/bryan-engelhardt-a099204b')
data = html.fromstring(response.text)
print(data.xpath('//title/text()')) #looks for title and prints it
print(data.xpath('//*[@id="topcard"]/div[1]/div/div/table/tbody/tr[1]/td/ol/li/span/a/text()')) # using a direct xpath
print(data.xpath('//*[@id="topcard"]/div[1]/div/div/table/tbody/tr[2]/td/ol/li/a/text()'))
The output looks as follows:
C:\Python34\python.exe "C:/Users/Holy Cross - Summer/Desktop/python/scrape/scrape1.py"
[]
[]
[]
Process finished with exit code 0
I am not sure why its returning that as an response as i have tried this with other websites and gotten successful results. It might be LinkedIn trying to block me from getting this information and if it is, how can i get around it.
I think you should read linkedin terms of service.
The linkedin robots.txt file states that you need whitelisting to scrape the site.
# Notice: If you would like to crawl LinkedIn,
# please email [email protected] to apply
# for white listing.
I would start by trying to apply for whitelisting.
You could try to make your bot look like a human by playing with the user agent and whatnot, but I wouldn't recommend it.