Search code examples
pythontwitterweb-scraping

Scrape from a website that requires twitter login with Python


I’ve recently had to perform some python web scraping from a site that required twitter login. It wasn’t very straight forward as I expected and I'm encountring a lot of problems :

I'm trying to scrape data from https://www.scoutzen.com/twitter-lists/search?page=1&q=luxury And I can access to the result only when I'm connected to twitter . So I tried to log in twitter with python and then send the request to website I want . hereunder my code :

session_requests = requests.session()
result = session_requests.get("https://twitter.com/login")


authenticity_token=
list(set(tree.xpath("//input[@name='authenticity_token']/@value")))[0]


payload = {
    'action': 'login',
'session[username_or_email]': '[email protected]', 
'session[password]': 'pass', 
'authenticity_token': authenticity_token
}

result = session_requests.post("https://twitter.com/login", data = payload, 
headers = dict(referer = "https://twitter.com/login"))

# Scrape url
result = session_requests.get("https://www.scoutzen.com/twitter-lists/search?
q=luxury", headers = dict(referer = "https://www.scoutzen.com/twitter-
lists/search?q=luxury"))

print(result.text)

I checked that the log in twitter was performed with success , but I realized that the website www.scoutzen.com still require a login .

Could it be related to cookies ? Or should I try another package to log in ?

I will appreciate Any help . Many thanks


Solution

  • An easy way to deal with this problem is to use Selenium web browser, which can be controlled with python. That way, it will work just as your web browser and will manage all cookies and everything for you. With it you can also display javascript.

    Check the Selenium Starter Guide