Search code examples
pythonpython-3.xweb-scrapingurllib2urllib

How to see the downloading process?


When using urllib.request.urlretrieve, I was wondering if there is a way to see the status of the download printed in the console along with information like the file size?

Here's the code I was testing on:

#!/usr/bin/env python3.5.2

import urllib.request
import os


# make sure to change the directory before you test the code
directory = r'C:\Users\SalahGfx\Desktop\Downloads'

url = 'https://upload.wikimedia.org/wikipedia/en/d/d8/C4D_Logo.png' 

def get_name_path():
    image_name = url.split('/')[-1]
    return os.path.join(directory, image_name)

urllib.request.urlretrieve(url, get_name_path())

print('The Image has been downloaded!')

Solution

  • There isn't anything built in. You would have to write it yourself. Luckily, it isn't too difficult. You just have to read the length of the content from the header and then read the response by chunks. Here's the code

    import urllib2, sys
    
    def chunk_report(bytes_so_far, chunk_size, total_size):
        percent = float(bytes_so_far) / total_size
        percent = round(percent*100, 2)
        sys.stdout.write("Downloaded %d of %d bytes (%0.2f%%)\r" % 
                         (bytes_so_far, total_size, percent))
    
        if bytes_so_far >= total_size:
           sys.stdout.write('\n')
    
    def chunk_read(response, chunk_size=8192, report_hook=None):
        total_size = response.info().getheader('Content-Length').strip()
        total_size = int(total_size)
        bytes_so_far = 0
    
        while 1:
           chunk = response.read(chunk_size)
           bytes_so_far += len(chunk)
    
           if not chunk:
              break
    
           if report_hook:
              report_hook(bytes_so_far, chunk_size, total_size)
    
        return bytes_so_far
    
    def get_name_path():
        image_name = url.split('/')[-1]
        return os.path.join(directory, image_name)
    
    # make sure to change the directory before you test the code
    directory = r'C:\Users\SalahGfx\Desktop\Downloads'
    
    url = 'https://upload.wikimedia.org/wikipedia/en/d/d8/C4D_Logo.png' 
    
    response = urllib.request.urlopen(url, get_name_path())
    chunk_read(response, report_hook=chunk_report)
    

    It's important to note that I used urllib.request.urlopen because urllib2.request.retrieve is considered legacy and will be deprecated soon.