Search code examples
pythonwindowslinuxmacossamba

How do I exclude files from a search that may be in use or being copied to in python?


I'm new to python so this might end up having a simple solution.

At my house, I have 3 computers that are relevant to this situation: - File Server (linux) - My main PC (windows) - Girlfriend's MacBook Pro

My file server is running ubuntu and samba. I've installed python 3.1 and I've written my code in 3.1.

I've created a daemon that determines when certain files exist in the uploads directory that follow a given pattern. Upon finding such file, it renames it and moves it to a different location on a different drive. It also re-writes the owner, group, and permissions. All of this works great. It runs this process every minute.

If I copy files from my main pc (running a flavor of windows), the process always works. (I believe windows locks the file until its done copying-- I could be wrong.) If my girlfriend copies a file, it picks up the file before the copy is complete and things get messy. (underscored versions of the files with improper permissions are created and occasionally, the file will go into the correct place) I am guessing here that her mac book does not lock the file when copying. I could also be wrong there.

What I need is a way to exclude files that are either in use or, failing that, are being created.

For reference, the method I've created to find the files is:

# _GetFileListing(filter)
# Description: Gets a list of relevant files based on the filter
#
# Parameters: filter - a compiled regex query
# Retruns:
#   Nothing. It populates self.fileList
def _GetFileListing(self, filter):
    self.fileList = []
    for file in os.listdir(self.dir):
        filterMatch = filter.search(file)
        filepath = os.path.join(self.dir, file)

        if os.path.isfile(filepath) and filterMatch != None:
            self.fileList.append(filepath)

Note, this is all in a class.

The method I've created to manipulate the files is:

# _ArchiveFile(filepath, outpath)
# Description: Renames/Moves the file to outpath and re-writes the file permissions to the permissions used for
#   the output directory. self.mask, self.group, and self.owner for the actual values.
#
# Parameters: filepath - path to the file
#             outpath - path to the file to output
def _ArchiveFile(self, filepath, outpath):
    dir,filename,filetype = self._SplitDirectoryAndFile(outpath)

    try:
        os.makedirs(dir, self.mask)
    except OSError:
        #Do Nothing!
        dir = dir

    uid = pwd.getpwnam(self.owner)[2]
    gid = grp.getgrnam(self.group)[2]
    #os.rename(filepath, outpath)
    shutil.move(filepath, outpath)
    os.chmod(outpath, self.mask)
    os.chown(outpath, uid, gid)

I've stopped using os.rename because it seems to have stopped working when I started moving files to different drives.

Short Version: How do I prevent myself from picking up files in my search that are currently being transferred?

Thank you in advance for any help you might be able to provide.


Solution

  • Turns out the write lock approach didn't work. I guess I didn't properly test it before updating here.

    What I've decided to do for now is:

    • Reduce the time between checks to 30s
    • Keep a list of files found in the previous iteration and their respective file sizes
    • Check the new list of files against the old list

    If the new list contains the same file with the same file size as the old list, put it in a list to be transferred. The remaining files in the new list become the old list and the process continues.

    I'm sure the lsof method will work but I'm not sure how to use it in python. Also this method should work quite well for my situation since I am mostly concerned with not moving the files while they're in transit.

    I would also have to exclude all files that start with "._" since the mac creates those and I'm not sure if they increase in size over time.

    Alternatively, I have the option to handle just cases where it's being transferred by her mac. I know that when the mac is transferring the file, it creates:

    • filename.ext
    • ._filename.ext

    I could check the list for all instances of filename where it is preceded with ._ and exclude files that way.

    I'll probably try the second option first. It's a little dirty but hopefully it will work.