Search code examples
pythonurllib

Download images from IPFS


I have a nice set of URLs saved in this format

Number  Link
0   https://ipfs.io/ipfs/QmRRPWG96cmgTn2qSzjwr2qvfNEuhunv6FNeMFGa9bx6mQ
1   https://ipfs.io/ipfs/QmPbxeGcXhYQQNgsC6a36dDyYUcHgMLnGKnF8pVFmGsvqi
2   https://ipfs.io/ipfs/QmcJYkCKK7QPmYWjp4FD2e3Lv5WCGFuHNUByvGKBaytif4
3   https://ipfs.io/ipfs/QmYxT4LnK8sqLupjbS6eRvu1si7Ly2wFQAqFebxhWntcf6
4   https://ipfs.io/ipfs/QmSg9bPzW9anFYc3wWU5KnvymwkxQTpmqcRSfYj7UmiBa7
5   https://ipfs.io/ipfs/QmNwbd7ctEhGpVkP8nZvBBQfiNeFKRdxftJAxxEdkUKLcQ
6   https://ipfs.io/ipfs/QmWBgfBhyVmHNhBfEQ7p1P4Mpn7pm5b8KgSab2caELnTuV
7   https://ipfs.io/ipfs/QmRsJLrg27GQ1ZWyrXZFuJFdU5bapfzsyBfm3CAX1V1bw6

I am trying to use a loop to loop through all of the links and save the file

import urllib.request

for x,y in zip(link, num):
    url = str(x)
    name = str(y)
    filename = "%s.png" % name   
    urllib.request.urlretrieve(url, filename)

Everytime I run this code I get this error

URLError: <urlopen error [WinError 10054] An existing connection was forcibly closed by the remote host>

What is weird is that if I just run the code on one URL then it works fine.

import urllib.request

name = 1
filename = "%s.png" % name   
urllib.request.urlretrieve("https://ipfs.io/ipfs/QmcJYkCKK7QPmYWjp4FD2e3Lv5WCGFuHNUByvGKBaytif4", filename)

How can this be fixed so that the code runs in a loop with no errors?

thanks

EDIT

Here is some code that works for 1 image

import pandas as pd 
import urllib.request

links = [['number', 'link'], ['1', 'https://ipfs.io/ipfs/QmPbxeGcXhYQQNgsC6a36dDyYUcHgMLnGKnF8pVFmGsvqi'], ['2', 'https://ipfs.io/ipfs/QmcJYkCKK7QPmYWjp4FD2e3Lv5WCGFuHNUByvGKBaytif4'], ['3', 'https://ipfs.io/ipfs/QmYxT4LnK8sqLupjbS6eRvu1si7Ly2wFQAqFebxhWntcf6']]
data = pd.DataFrame(links)

link = data.get('Link', None) 
num = data.get('Number', None)


name = 1
filename = "%s.png" % name   
urllib.request.urlretrieve("https://ipfs.io/ipfs/QmYxT4LnK8sqLupjbS6eRvu1si7Ly2wFQAqFebxhWntcf6", filename)

Solution

  • You are being throttled by the IPFS service. You need to implement API rate limiting (or see if the service has a premium option that allows you to pay for higher API request rates).

    Here's one way to implement client-side rate limiting, using exponential backoff/retry:

    1. save this retry code as retry.py
    2. fix a couple of Python v2 issues in retry.py (except ExceptionToCheck as e: at line 32 and print(msg) at line 37)
    3. modify your client code as follows
    import urllib.request
    from retry import retry
    
    LINKS = [
        "https://ipfs.io/ipfs/QmRRPWG96cmgTn2qSzjwr2qvfNEuhunv6FNeMFGa9bx6mQ",
        "https://ipfs.io/ipfs/QmPbxeGcXhYQQNgsC6a36dDyYUcHgMLnGKnF8pVFmGsvqi",
        "https://ipfs.io/ipfs/QmcJYkCKK7QPmYWjp4FD2e3Lv5WCGFuHNUByvGKBaytif4",
        "https://ipfs.io/ipfs/QmYxT4LnK8sqLupjbS6eRvu1si7Ly2wFQAqFebxhWntcf6",
        "https://ipfs.io/ipfs/QmSg9bPzW9anFYc3wWU5KnvymwkxQTpmqcRSfYj7UmiBa7",
        "https://ipfs.io/ipfs/QmNwbd7ctEhGpVkP8nZvBBQfiNeFKRdxftJAxxEdkUKLcQ",
        "https://ipfs.io/ipfs/QmWBgfBhyVmHNhBfEQ7p1P4Mpn7pm5b8KgSab2caELnTuV",
        "https://ipfs.io/ipfs/QmRsJLrg27GQ1ZWyrXZFuJFdU5bapfzsyBfm3CAX1V1bw6",
    ]
    
    @retry(urllib.error.URLError, tries=4)
    def download(index, url):
        filename = "%s.png" % index
        urllib.request.urlretrieve(url, filename)
    
    def main():
        for index, link in enumerate(LINKS):
            print(index, link)
            download(index, link)
    
    if __name__ == '__main__':
        main()
    

    I tested this code without retries and it was throttled (as expected). Then I added the retry decorator and it completed successfully (including a couple of expected retries).