Search code examples
pythonpython-2.7urllib2

CSV download only generating webpage html using Urllib2


I am having issues downloading a csv file with urllib2. Here is the code I am using..

import cookielib
import urllib
import urllib2
import csv

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
opener.addheaders = [('User-agent', 'Mozilla/5.0')]
authentication_url = login_website
payload = {
    'altxuname': username,
    'altxpass': password,
    'Submit': 'Login'
}
urllib2.install_opener(opener)
data = urllib.urlencode(payload)
opener.open(authentication_url, data)
resp = opener.open(csv_url)
contents = csv.reader(resp)


with open('logintest.csv', 'wb') as f:
    writer = csv.writer(f)
    writer.writerows(contents)
    f.close()

I have tested the code on csv files on websites that do not require a login. I have also tested the login part and am able to login and navigate to the webpage but when I try to combine the two login and then try to download a the file all I get in the csv file is the html of the webpage. The url I am using for csv_url is a direct link to download the file. Any help would be awesome! Thanks!

Edit Here is the code for the button click..

<input name="exportcsv" type="button" class="button"     onclick="location.href='/techforce/report.php?report_id=129&amp;rf_67=&amp;rf_31=08%2F04%2F2014&amp;rt_31=&amp;rf_448=&amp;rt_448=&amp;rf_64=&amp;rt_64=&amp;rf_387=c.state&amp;rf_387_Op=equals&amp;rf_387_Value=&amp;rf_148=c.state&amp;rf_148_Op=equals&amp;rf_148_Value=&amp;rf_223_Op=&amp;rf_55_Op=&amp;rf_46_Op=&amp;Submit=Display+Report&amp;export=csv&amp;csv=true'" value="Export CSV">

Solution

  • This is website specific problem, no generic solution. But try using mechanize library (in Python) and you can solve this type of problem quickly. Mechanize works like a browser in your code.