Search code examples
pythonpython-requestsurllib

Possibility to get content of urls with headers and write it to a file (python 3.7)


I have multiple urls that differ in query strings parameters, maily in days, for instance:

urls = [f'https://example.com?query=from-{x+1}d+TO+-{x}d%data' for x in range(10)]

I want to write the content of all these urls to just one file. I tried with urllib.requests:

import urllib.request

key = "some value"
requests = urllib.request.Request([url for url in urls], headers={"key":key})
<urllib.request.Request object at 0x7f48e8381490>

but the first pitfall is that 'Request' object is not iterable

responses = urllib.request.urlopen([request for request in requests])
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'Request' object is not iterable

Ideally the result could go to a file as below:

data = open('file_name', 'a')
data.write([response.read() for response in responses])

I also tried with requests lib

import requests
test = requests.Session()
r = test.get([url for url in urls], headers={"key":key})

but this fails with

    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for <list of urls>

Is there a way to get the content of these urls with headers and to send it to a file?


Solution

  • I suppose you might want to do something like this:

    import urllib.request
    
    with open("file_name", "a") as data:
        for url in urls:
            req = urllib.request.Request(url, headers={"key": "key"})
            with urllib.request.urlopen(req) as response:
                html = response.read()
                data.write(html)