Search code examples
pythonrequestlinkedin-apilxmldata-extraction

Scraping Linkedin with python only returns ''[ ]'


I am trying to scrape LinkedIn to get the current and education elements(This information is publicly available) from any user profile. However, after running this code I cant get the information i need but instead only empty brackets [].

https://www.linkedin.com/in/bryan-engelhardt-a099204b This is the exact link i am using at this moment and from here i want to be able to scrape the following information: "Current-College of the Holy Cross" and "Education-University of Iowa"

My code is pretty simple:

from lxml import html
import requests

response = requests.get('https://www.linkedin.com/in/bryan-engelhardt-a099204b')
data = html.fromstring(response.text)

print(data.xpath('//title/text()')) #looks for title and prints it
print(data.xpath('//*[@id="topcard"]/div[1]/div/div/table/tbody/tr[1]/td/ol/li/span/a/text()')) # using a direct xpath
print(data.xpath('//*[@id="topcard"]/div[1]/div/div/table/tbody/tr[2]/td/ol/li/a/text()'))

The output looks as follows:

C:\Python34\python.exe "C:/Users/Holy Cross - Summer/Desktop/python/scrape/scrape1.py"
[]
[]
[]
Process finished with exit code 0

I am not sure why its returning that as an response as i have tried this with other websites and gotten successful results. It might be LinkedIn trying to block me from getting this information and if it is, how can i get around it.


Solution

  • I think you should read linkedin terms of service.

    The linkedin robots.txt file states that you need whitelisting to scrape the site.

    # Notice: If you would like to crawl LinkedIn,
    # please email [email protected] to apply
    # for white listing.
    

    I would start by trying to apply for whitelisting.

    You could try to make your bot look like a human by playing with the user agent and whatnot, but I wouldn't recommend it.