Search code examples

How to use Python3.6 tarfile module to read from memory?

I would like to download a tarfile from url to memory and than extract all its content to folder dst. What should I do?

Below are my attempts but I could not achieve my plan.

# -*- coding: utf-8 -*-

from pathlib import Path
from io import BytesIO
from urllib.request import Request, urlopen
from urllib.error import URLError
from tarfile import TarFile

def get_url_response( url ):
    req = Request( url )
        response = urlopen( req )
    except URLError as e:
        if hasattr( e, 'reason' ):
            print( 'We failed to reach a server.' )
            print( 'Reason: ', e.reason )
        elif hasattr( e, 'code'):
            print( 'The server couldn\'t fulfill the request.' )
            print( 'Error code: ', e.code )
        # everything is fine
        return response

url = ''
dst = Path().cwd() / 'Tar'

response = get_url_response( url )

with TarFile( BytesIO( ) ) as tfile:
    tfile.extractall( path=dst )

However, I got this error:

Traceback (most recent call last):
  File "~/", line 31, in <module>
    with TarFile( BytesIO( ) ) as tfile:
  File "/usr/lib/python3.6/", line 1434, in __init__
    fileobj = bltn_open(name, self._mode)
TypeError: expected str, bytes or os.PathLike object, not _io.BytesIO

I tried passing the BytesIO object to TarFile as a fileobj:

with TarFile( fileobj=BytesIO( ) ) as tfile:
    tfile.extractall( path=dst )

However, it still can't work:

Traceback (most recent call last):
  File "/usr/lib/python3.6/", line 188, in nti
    s = nts(s, "ascii", "strict")
  File "/usr/lib/python3.6/", line 172, in nts
    return s.decode(encoding, errors)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd2 in position 0: ordinal not in range(128)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib/python3.6/", line 2297, in next
    tarinfo = self.tarinfo.fromtarfile(self)
  File "/usr/lib/python3.6/", line 1093, in fromtarfile
    obj = cls.frombuf(buf, tarfile.encoding, tarfile.errors)
  File "/usr/lib/python3.6/", line 1035, in frombuf
    chksum = nti(buf[148:156])
  File "/usr/lib/python3.6/", line 191, in nti
    raise InvalidHeaderError("invalid header")
tarfile.InvalidHeaderError: invalid header

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "~/", line 31, in <module>
    with TarFile( fileobj=BytesIO( ) ) as tfile:
  File "/usr/lib/python3.6/", line 1482, in __init__
    self.firstmember =
  File "/usr/lib/python3.6/", line 2309, in next
    raise ReadError(str(e))
tarfile.ReadError: invalid header


  • This approach was very close to correct:

    with TarFile( fileobj=BytesIO( ) ) as tfile:
        tfile.extractall( path=dst )

    You should use instead of TarFile (see docs), and tell it that you are reading an xz file (mode='r:xz'):

    with fileobj=BytesIO( ), mode='r:xz' ) as tfile:
        tfile.extractall( path=dst )

    However, as you'll notice, this is still not enough.

    The root problem? You're downloading from a site which disallows hotlinking. The website is blocking your attempt to download. Try printing out the response and you'll see you get a load of junk HTML instead of a tar.xz file.