Search code examples
pythonpython-3.7tarfile

tarfile filter in python 3.7


This is the part of my backup script that I separated for testing. This used to work in python 3.6 but I am getting errors in python 3.7. I checked the docs but can't really figure out what changed.

#! /usr/bin/python

import os
import tarfile
import arrow


stmp = arrow.now().format('YYYY-MM-DD')
backup_path = "/home/akya/playground/"

###########################################################
# Functions
###########################################################

def exclude_function(filename):
    if filename in exclude_files or os.path.splitext(filename)[1] in exclude_files:
        return True
    else:
        return False

def compress():
    with tarfile.open(myfile, "w:gz") as tar:
        for name in L:
            tar.add(name, filter=exclude_function)

path1 = "/home/akya/playground/test"
backup_name = "test-"

L = [path1]
exclude_files = [".sh", ".html", ".json", "/home/akya/playground/test/.git"]
myfile = str(backup_path + backup_name + stmp + ".tar.gz")

compress()

I want the tarfile to exclude any extensions and paths in exclude_files. and this is what I get

Traceback (most recent call last):
  File "exec.py", line 33, in <module>
    compress()
  File "exec.py", line 24, in compress
    tar.add(name, filter=exclude_function)
  File "/usr/lib/python3.7/tarfile.py", line 1934, in add
    tarinfo = filter(tarinfo)
  File "exec.py", line 16, in exclude_function
    if filename in exclude_files or os.path.splitext(filename)[1] in exclude_files:
  File "/usr/lib/python3.7/posixpath.py", line 122, in splitext
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not TarInfo

Can anyone point out the change required? or a better way to use filters?


Solution

  • The filter callback passes a TarInfo structure to your callback, so you can filter by something else than name: date, size, modification time (looks very much like a stat object)...

    Also, it doesn't work that way (returning booleans), you have to return None if you want to exclude this entry:

    TarFile.add(name, arcname=None, recursive=True, *, filter=None)

    ... If filter is given, it should be a function that takes a TarInfo object argument and returns the changed TarInfo object. If it instead returns None the TarInfo object will be excluded from the archive. See Examples for an example.

    In your case, just pick the .name field and return None if if matches your filter. Else return the same tarinfo structure you received.

    def exclude_function(tarinfo):
        filename = tarinfo.name
        return None if filename in exclude_files or os.path.splitext(filename)[1] in exclude_files else tarinfo
    

    (note that I don't know why it worked for you in earlier versions, maybe you added that or os.path.splitext(filename)[1] in exclude_files part which reveals that the type is incorrect - the first in test wouldn't crash even with an object of the wrong type - but that's the proper way of doing it)