import requests
import urllib3
from time import sleep
from sys import argv
script, filename = argv
http = urllib3.PoolManager()
datafile = open('datafile.txt','w')
crawl = ""
with open(filename) as f:
mylist = f.read().splitlines()
def crawlling(x):
for i in mylist:
domain = ("http://" + "%s") % i
crawl = http.request('GET','%s',preload_content=False) % domain
for crawl in crawl.stream(32):
print crawl
sleep(10)
crawl.release_conn()
datafile.write(crawl.status)
datafile.write('>>>>>>>>>>>>>>>>>>>>>>>>>>>>\n')
datafile.write(crawl.data)
datafile.close()
return x
crawlling(crawl)
_______________________________________________________________________
Extract of domain.txt file:
fjarorojo.info
buscadordeproductos.com
I'm new to python so bear with me: I'm trying to trying get content from URL but it's throwing error. Further, it's working fine in browser. Object of script is to get the data from domain.txt file and iterate over it and fetch the content and save it in file.
Getting this error:
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='%s',
port=80): Max retries exceeded with url: / (Caused by
NewConnectionError('<urllib3.connection.HTTPConnection object at
0x7ff45e4f9cd0>: Failed to establish a new connection: [Errno -2] Name or
service not known',))
This line is the problem:
crawl = http.request('GET','%s',preload_content=False) % domain
Right now you're trying to make a request to the domain %s
which is not a valid domain, hence the error "Name or
service not known".
It should be:
crawl = http.request('GET', '%s' % domain, preload_content=False)
Or more simply:
crawl = http.request('GET', domain, preload_content=False)
Also, unrelated to the error you posted, these lines will likely cause problems too:
for crawl in crawl.stream(32):
print crawl
sleep(10)
crawl.release_conn() # <--
You're releasing the connection in a loop, so the loop will fail to yield the expected results on the second iteration. Instead, you should only release the connection once you're done with the request. More details here.