Search code examples
pythonstreamtarfile

Python: tarfile stream


I would like to read some files from a tarball and save it to a new tarball. This is the code I wrote.

archive = 'dum/2164/archive.tar'

# Read input data.
input_tar = tarfile.open(archive, 'r|')
tarinfo = input_tar.next()
input_tar.close()

# Write output file.
output_tar = tarfile.open('foo.tar', 'w|')
output_tar.addfile(tarinfo)
output_tar.close()

Unfortunately, the output tarball is no good:

$ tar tf foo.tar
./1QZP_A--2JED_A--not_reformatted.dat.bz2
tar: Truncated input file (needed 1548288 bytes, only 1545728 available)
tar: Error exit delayed from previous errors.

Any clue how to read and write tarballs on the fly with Python?


Solution

  • OK so this is how I managed to do it.

    archive = 'dum/2164/archive.tar'
    
    # Read input data.
    input_tar = tarfile.open(archive, 'r|')
    tarinfo = input_tar.next()
    fileobj = input_tar.extractfile(tarinfo)
    
    # Write output file.
    output_tar = tarfile.open('foo.tar', 'w|')
    output_tar.addfile(tarinfo, fileobj)
    
    input_tar.close()
    output_tar.close()