Search code examples
pythongithubwget

Downloading binary files from github using Wget in python


I am creating a program that grabs files from a GitHub repository, but raw.githubusercontent.com doesn’t work for binary files. If .ico files are not binary, then tell me how to download those.

I am using wget.download(), like this:

import wget
url = “ https://raw.githubusercontent.com/user/repository/branch/file”
wget.download(url)

Any suggestions?


Solution

  • If you add ?raw=true to the end of a file, it should work.

    For example: the .ico for my homepage cannot be viewed in the browser.

    # Original Link
    https://github.com/hayesall/hayesall.github.io/blob/master/favicon.ico
    
    # (1) Appended with `?raw=true`
    https://github.com/hayesall/hayesall.github.io/blob/master/favicon.ico?raw=true
    
    # (2) Going through githubusercontent
    https://raw.githubusercontent.com/hayesall/hayesall.github.io/master/favicon.ico
    

    Option (1) or (2) may be downloaded with wget:

    import wget
    url = "https://github.com/hayesall/hayesall.github.io/blob/master/favicon.ico?raw=true"
    wget.download(url)
    100% [.......................] 1150 / 1150
    

    Checking the downloaded file:

    $ file favicon.ico
    favicon.ico: MS Windows icon resource - 1 icon, 16x16, 32 bits/pixel
    

    In Atom:

    Screenshot from Atom text editor, showing the code on top and the ico image data in the bottom window.

    Version information:

    $ wget --version
    GNU Wget 1.19.4 built on linux-gnu.
    $ pip freeze | grep "wget"
    wget==3.2
    

    If this still isn't working, maybe it's a problem with wget (its most recent update was in 2015). Here's an alternative solution using requests:

    import shutil
    import requests
    
    url = "https://raw.githubusercontent.com/hayesall/hayesall.github.io/master/favicon.ico"
    req = requests.get(url, stream=True)
    
    assert req.status_code == 200
    
    with open("favicon.ico", "wb") as _fh:
        req.raw.decode_content = True
        shutil.copyfileobj(req.raw, _fh)