Search code examples
pythonauthenticationopenidurllib2

How to request pages from website that uses OpenID?


This question has been asked here before. The accepted answer was probably obvious to both questioner and answerer---but not to me. I have commented on the above question to get more precisions, but there was no response. I also approached the meta Q&A for help on how to bring back questions from their grave, and got no answer either.

The answer to the here above question was:

From the client's perspective, an OpenID login is very similar to any other web-based login. There isn't a defined protocol for the client; it is an ordinary web session that varies based on your OpenID provider. For this reason, I doubt that any such libraries exist. You will probably have to code it yourself.

I know how to log onto a website with Python already, using the Urllib2 module. But that's not enough for me to guess how to authenticate to an OpenID.

I'm actually trying to get my StackOverflow inbox in json format, for which I need to be logged in.

Could someone provide a short intro or a link to a nice tutorial on how to do that?


Solution

  • This answer sums up what others have said below, especially RedBaron, plus adding a method I used to get to the StackOverflow Inbox using Google Accounts.

    Using the Tamper Data developer tool of Firefox and logging on to StackOVerflow, one can see that OpenID works this way:

    1. StackOverflow requests authentication from a given service (here Google), defined in the posted data;
    2. Google Accounts takes over and checks for an already existing cookie as proof of authentication;
    3. If no cookie is found, Google requests authentication and sets a cookie;
    4. Once the cookie is set, StackOverflow acknowledges authentication of the user.

    The above sums up the process, which in reality is more complicated, since many redirects and cookie exchanges occur indeed.

    Because reproducing the same process programmatically proved somehow difficult (and that might just be my illiteracy), especially trying to hunt down the URLs to call with all locale specifics etc. I opted for loging on to Google Accounts first, getting a well deserved cookie and then login onto Stackoverflow, which would use the cookie for authentication.

    This is done simply using the following Python modules: urllib, urllib2, cookielib and BeautifulSoup.

    Here is the (simplified) code, it's not perfect, but it does the trick. The extended version can be found on Github.

    #!/usr/bin/env python
    
    import urllib
    import urllib2
    import cookielib
    from BeautifulSoup import BeautifulSoup
    from getpass import getpass
    
    # Define URLs
    google_accounts_url = 'http://accounts.google.com'
    authentication_url = 'https://accounts.google.com/ServiceLoginAuth'
    stack_overflow_url = 'https://stackoverflow.com/users/authenticate'
    genuwine_url = 'https://stackoverflow.com/inbox/genuwine'
    
    # Build opener
    jar = cookielib.CookieJar()
    opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(jar))
    
    def request_url(request):    
        '''
            Requests given URL.
        '''     
        try:
            response = opener.open(request)
        except:
            raise
        return response
    
    
    def authenticate(username='', password=''):        
        '''
            Authenticates to Google Accounts using user-provided username and password,
            then authenticates to StackOverflow.
        '''
        # Build up headers
        user_agent = 'Mozilla/5.0 (Ubuntu; X11; Linux i686; rv:8.0) Gecko/20100101 Firefox/8.0'
        headers = {'User-Agent' : user_agent}
    
        # Set Data to None
        data = None
    
        # Build up URL request with headers and data    
        request = urllib2.Request(google_accounts_url, data, headers)
        response = request_url(request)
    
        # Build up POST data for authentication
        html = response.read()
        dsh = BeautifulSoup(html).findAll(attrs={'name' : 'dsh'})[0].get('value').encode()
    
        auto = response.headers.getheader('X-Auto-Login')
    
        follow_up = urllib.unquote(urllib.unquote(auto)).split('continue=')[-1]
    
        galx = jar._cookies['accounts.google.com']['/']['GALX'].value
    
        values = {'continue' : follow_up,
                  'followup' : follow_up,
                  'dsh' : dsh,
                  'GALX' : galx,
                  'pstMsg' : 1,
                  'dnConn' : 'https://accounts.youtube.com',
                  'checkConnection' : '',
                  'checkedDomains' : '',
                  'timeStmp' : '',
                  'secTok' : '',
                  'Email' : username,
                  'Passwd' : password,
                  'signIn' : 'Sign in',
                  'PersistentCookie' : 'yes',
                  'rmShown' : 1}
    
        data = urllib.urlencode(values)
    
        # Build up URL for authentication
        request = urllib2.Request(authentication_url, data, headers)
        response = request_url(request)
    
        # Check if logged in
        if response.url != request._Request__original:
            print '\n Logged in :)\n'
        else:
            print '\n Log in failed :(\n'
    
        # Build OpenID Data    
        values = {'oauth_version' : '',
                  'oauth_server' : '',
                  'openid_username' : '',
                  'openid_identifier' : 'https://www.google.com/accounts/o8/id'}
    
        data = urllib.urlencode(values)
    
        # Build up URL for OpenID authetication
        request = urllib2.Request(stack_overflow_url, data, headers)
        response = request_url(request)
    
        # Retrieve Genuwine
        data = None
        request = urllib2.Request(genuwine_url, data, headers)
        response = request_url(request)
        print response.read()
    
    
    if __name__ == '__main__':
        username = raw_input('Enter your Gmail address: ')
        password = getpass('Enter your password: ')
        authenticate(username, password)