Search code examples
pythonlinuxrsync

Python rsync script directory names mirror


I have a script that i use to push files back to my home PC using rsync. File names successfully pushed are added to a sqlite database so they don't get pushed again ( since i only want 1 way mirror ). Anyhow, the problem that i have is that although the script recursively goes down the source path and push files based on a defined extension, the files go down the same destination root directory.

What i am trying to is to have the destination folder structure the same as the source.

I think i have to do add something to the destDir path, but not exactly sure what:

for root, dirs, files in os.walk(sourceDir):
   for file in files:
     //If some filtering criteria
     print("Syncing new file: "+file)
     cmd=["rsync"]
     cmd.append(os.path.join(root, file))
     cmd.append(destDir+ "/")
     p=subprocess.Popen(cmd,shell=False)
 if p.wait()==0:
  rememberFile(file)

Solution

  • I think you should rely on the features of rsync for this as much as possible, rather than trying to reimplement it in Python. rsync has been extensively tested and is full-featured. They've fixed all of the bugs that you're encountering. For instance, in your original code snippet, you need to reconstruct the full path of your file (instead of just the filename) and add that to your destDir.

    But before you keep debugging that, consider this alternative. Instead of a sql db, why not keep all of the files that you have pushed in a plain text file? Let's say it's called exclude_list.txt. Then your one-liner rsync command is:

    rsync -r --exclude-from 'exclude_list.txt' src dst
    

    The -r switch will cause it to traverse the file tree automatically. See topic #6 on this page for more details on this syntax.

    Now you only need your Python script to maintain exclude_list.txt. I can think of two options:

    • Capture the output of rsync with the -v option to list the filenames that were moved, parse them, and append to exclude_list.txt. I think this is the most elegant solution. You can probably do it in just a few lines.
    • Use the script you already have to traverse the tree and add all of the files to exclude_list.txt, but remove all of the individual rsync calls. Then call rsync once at the end, as above.