Search code examples
pythonfiletargzip

Read .tar.gz file in Python


I have a text file of 25GB. so i compressed it to tar.gz and it became 450 MB. now i want to read that file from python and process the text data.for this i referred question . but in my case code doesn't work. the code is as follows :

import tarfile
import numpy as np 

tar = tarfile.open("filename.tar.gz", "r:gz")
for member in tar.getmembers():
     f=tar.extractfile(member)
     content = f.read()
     Data = np.loadtxt(content)

the error is as follows :

Traceback (most recent call last):
  File "dataExtPlot.py", line 21, in <module>
    content = f.read()
AttributeError: 'NoneType' object has no attribute 'read'

also, Is there any other method to do this task ?


Solution

  • The docs tell us that None is returned by extractfile() if the member is a not a regular file or link.

    One possible solution is to skip over the None results:

    with tarfile.open("filename.tar.gz", "r:gz") as tar:
        for member in tar.getmembers():
             f = tar.extractfile(member)
             if f is not None:
                 content = f.read()