I have checked other posts, but the solutions did not seem to work. I keep getting the AttributeError: 'module' object has no attribute 'urlopen' error. Any ideas why this wouldn't work be greatly appreciated.
from lxml import html
import requests
import urllib3
page = requests.get('http://www.sfbos.org/index.aspx?page=18701')
tree = html.fromstring(page.content)
#This will create a list of buyers:
proposal_doc_date = tree.xpath('//ul[@title="Date List"]/li/a/text()')
pdf_url = tree.xpath('//ul[@title="Date List"]/li/a/@href')
print 'Proposal Date ', proposal_doc_date
print 'Proposal PDF ', pdf_url
def download_pdf(url_list):
for i in url_list:
response = urllib3.urlopen(i)
file = open(proposal_doc_date[i], 'wb')
file.write(response.read())
file.close()
print("Completed")
download_pdf(pdf_url)
You're importing both requests and urllib3 which serve the same purpose (requests is built on top of urllib3). Here's how to do this using requests:
import requests
# ...
http = requests.Session()
def download_pdf(url_list):
for i in url_list:
response = http.get(i)
file = open(proposal_doc_date[i], 'wb')
file.write(response.content)
file.close()
print("Completed")
And similar scenario in urllib3:
import urllib3
# ...
http = urllib3.PoolManager()
def download_pdf(url_list):
for i in url_list:
response = http.request('GET', i)
file = open(proposal_doc_date[i], 'wb')
file.write(response.read())
file.close()
print("Completed")
There are all kinds of other things you can do with streaming requests and writing the response to a file as it streams. Check out the documentation for the respective project for more.