Search code examples
pythonseleniumgmailurllib2

Python - How to read content of web page without using url?


I am trying to make a program in Python to log in to gmail and read the inbox page. This is what I have tried using Selenium and urllib2 (I am new to these):

from requests import session
from selenium import webdriver
import getpass
import urllib2



def gmail_login(username, passw) :
    with session() as c :
        webpage = r'https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1&ltmpl=default&ltmplcache=2&emr=1&osid=1#identifier'

        driver = webdriver.Chrome('C:\Users\chromedriver_win32\chromedriver.exe')
        driver.get(webpage)

        driver.implicitly_wait(10)

        driver.find_element_by_name('Email').send_keys(username)

        driver.find_element_by_name('signIn').click() # Click 'Next' button after entry of email id.

        driver.find_element_by_id('Passwd').send_keys(passw)

        driver.find_element_by_id('signIn').click() # Click 'Sign In' button after entry of password.

        url = driver.current_url

        readPage(url)

def readPage(url):
    print url

    fName = "gmail_file.html"
    response = urllib2.urlopen(url)
    html = response.read()
    f = open(fName,"w")
    f.write(html)
    f.close()

gmail_login('username', 'password')

I got the login part correct but I'm not able to read the inbox page. In my code I'm basically reopening the inbox page using the url and then reading it and saving it in a html file. But in my html file all I get is the login page! I am guessing that directly opening an inbox page using its url is not allowed and is protected.

So I'm looking for a way to read the content of a web page (any, not only gmail) whose url is not required for the purpose. (The only way I know to read a web page is using urlopen() which requires the url.) Is there any function or library for this purpose ?


Solution

  • You could use Charlie Guo's gmail package. Once installed, you can use it like this:

    import gmail
    
    g = gmail.login("[email protected]", "password123")
    
    emails = g.inbox().mail(unread=True)
    
    for email in emails:
        email.fetch()
        header_from = email.headers['From']
        subject = email.headers['Subject']
        body = email.body
        [... do something cool with your gmail...]
    

    That's going to be much more reliable and simpler than screen scraping.