Search code examples
python-3.xtar

Is it possible to read from and append to a tar with Python tarfile


I'm attempting to read a tar file, identify some files, read them, and then write a new file to the same tarfile with Python. It appears extractfile() is only allowed if the mode is "r". Is this the case? Is there a way to both extract files from a tar in memory and also append new files to the tar at the same time? Sample code below:

def genEntry(tar, tarinfo, source):
    heading = re.compile(r'#+(\s+)?')
    f = tar.extractfile(tarinfo)
    f.seek(0)
    while True:
        line = f.readline().decode()
        print(line)
        if not line:
            break
        print(line)
        if heading.match(line):
            title = heading.sub('',line).replace('\n','')
            return[tarinfo.name.replace(source,'.'), title]
    return [tarinfo.name.replace(source,'.'), tarinfo.name.replace(source,'')]

with tarfile.open(args.source, mode='a') as tar:
  source = 'somepath'
  subDir = 'someSubDir'
  path = '/'.join((source, subDir))
  if tar.getmember(path):
    pathre = re.compile(r'{}\/.+?\/readme\.md'.format(re.escape(path)), re.IGNORECASE)
      for tarinfo in tar.getmembers():
        if re.search(pathre, tarinfo.name):
          genEntry(tar, tarinfo, source)
...

This will generate the following error:

OSError: bad operation for mode 'a'


Solution

  • As far as I can tell, it is not possible to read from and append to a tarfile in one pass. While I eventually went in the direction of facilitating streaming the tarfile in and out of my Python script, I did identify a two-pass read/write solution for my question above.

    Here's essentially the approach I landed on.

    files = []
    with tarfile.open(tarpath) as tar:
        files= readTar(tar)
    with tarfile.open(tarpath, mode='a') as tar:
        for fileobj in files:
            writeFile(tar, fileobj[0], fileobj[1])
    
    def readTar(tar):
    # Your your logic to build the files you want to build in the amended file here
    
    def writeFile(tar, tarinfo, payload):
        if len(payload) != 0:
            data = payload.encode('utf8')
            tarinfo.mode = 0o444
            tarinfo.size = len(data)
            tar.addfile(tarinfo, fileobj=BytesIO(data))
            return tar