Search code examples
pythonpycurl

pycurl script can't login to website


I'm currently trying to get a grasp on pycurl. I'm attempting to login to a website. After logging into the site it should redirect to the main page. However when trying this script it just gets returned to the login page. What might I be doing wrong?

import pycurl
import urllib
import StringIO

pf = {'username' : 'user', 'password' : 'pass' }
fields = urllib.urlencode(pf)
pageContents = StringIO.StringIO()

p = pycurl.Curl()
p.setopt(pycurl.FOLLOWLOCATION, 1)
p.setopt(pycurl.COOKIEFILE, './cookie_test.txt')
p.setopt(pycurl.COOKIEJAR, './cookie_test.txt')
p.setopt(pycurl.POST, 1)
p.setopt(pycurl.POSTFIELDS, fields)
p.setopt(pycurl.WRITEFUNCTION, pageContents.write)
p.setopt(pycurl.URL, 'http://localhost')
p.perform()

pageContents.seek(0)
print pageContents.readlines()

EDIT: As pointed out by Peter the URL should point to a login URL but the site I'm trying to get this to work for fails to show me what URL this would be. The form's action just points to the home page ( /index.html )


Solution

  • As you're troubleshooting this problem, I suggest getting a browser plugin like FireBug or LiveHTTPHeaders (I suggest Firefox plugins, but there are similar plugins for other browsers as well). Then you can exercise a request to the site and see what action (URL), method, and form parameters are being passed to the target server. This will likely help elucidate the crux of the problem.

    If that's no help, you may consider using a different tool for your mechanization. I've used ClientForm and BeautifulSoup to perform similar operations. Based on what I've read in the pycURL docs and your code above, ClientForm might be a better tool to use. ClientForm will parse your HTML page, locate the forms on it (including login forms), and construct the appropriate request for you based on the answers you supply to the form. You could even use ClientForm with pycURL... but at least ClientForm will provide you with the appropriate action to which to POST, and construct all of the appropriate parameters.

    Be aware, though, that if there is JavaScript handling any necessary part of the login form, even ClientForm can't help you there. You will need something that interprets the JavaScript to effectively automate the login. In that case, I've used SeleniumRC to control a browser (and I let the browser handle the JavaScript).