Search code examples

Login to a website with Google using BeautifulSoup and Python 2.7

I am writing a Python web-crawler for Quora, but need to log in using Google. I have searched the net, but nothing satisfies my problem. Here is my code:

# -*- coding: utf-8 -*-
import mechanize
import os
import requests
import urllib2
from bs4 import BeautifulSoup
import cookielib

# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]

# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call if you want)

# The action/ target from the form
authentication_url = ''

# Input parameters we are going to send
payload = {
  'op': 'login-main',
  'user': '<username>',
  'passwd': '<password>'

# Use urllib to encode the payload
data = urllib.urlencode(payload)

# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)

# Make the request and read the response
resp = urllib2.urlopen(req)
contents =

 # specify the url
 quote_page = ""

 # query the website and return the html to the variable ‘page’
 page = urllib2.urlopen(quote_page)

 # parse the html using beautiful soup and store in variable `soup`
 soup = BeautifulSoup(page, 'html.parser')
 # Take out the <div> of name and get its value
 name_box = soup.find('div', attrs={"class": "ContentWrapper"})

 name = name_box.text.strip() # strip() is used to remove starting and    trailing

 print name

 for link in soup.find_all('img'):
    image = link.get("src")

    image_name = os.path.split(image)[1]
    r2 = requests.get(image)
    with open(image_name, "wb") as f:

As I don't have any actual username for the site, I use my own Gmail account. In order to login, I used some code from a different question, but that does not work.

Any indentation errors are due to my lousy formatting.


  • To login and scrape, use a Session; make a POST request with the your credentials as a payload and then scrape.

    import requests
    from bs4 import BeautifulSoup
    with requests.Session() as s:
        p ="", data={
            "email": '*******',
            "password": "*************"
        base_page = s.get('')
        soup = BeautifulSoup(base_page.content, 'html.parser')