Search code examples
pythonpython-2.7winapipywin32

win32file.ReadDirectoryChangesW doesn't find all moved files


Good morning,

I've come across a peculiar problem with a program I'm creating in Python. It appears that when I drag and drop files from one location to another, not all of the files are registered as events by the modules.

I've been working with win32file and win32con to try an get all events related to moving files from one location to another for processing.

Here is a snip bit of my detection code:

import win32file
import win32con
def main():
    path_to_watch = 'D:\\'
    _file_list_dir = 1
    # Create a watcher handle
    _h_dir = win32file.CreateFile(
        path_to_watch,
        _file_list_dir,
        win32con.FILE_SHARE_READ |
        win32con.FILE_SHARE_WRITE |
        win32con.FILE_SHARE_DELETE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    while 1:
        results = win32file.ReadDirectoryChangesW(
            _h_dir,
            1024,
            True,
            win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
            win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
            win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
            win32con.FILE_NOTIFY_CHANGE_SIZE |
            win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
            win32con.FILE_NOTIFY_CHANGE_SECURITY,
            None,
            None
        )
        for _action, _file in results:
            if _action == 1:
                print 'found!'
            if _action == 2:
                print 'deleted!'

I dragged and dropped 7 files and it only found 4.

# found!
# found!
# found!
# found!

What can I do to detect all dropped files?


Solution

  • [ActiveState.Docs]: win32file.ReadDirectoryChangesW (this is the best documentation that I could find for [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions) is a wrapper over [MS.Docs]: ReadDirectoryChangesW function. Here's what it states (about the buffer):

    1. General

    When you first call ReadDirectoryChangesW, the system allocates a buffer to store change information. This buffer is associated with the directory handle until it is closed and its size does not change during its lifetime. Directory changes that occur between calls to this function are added to the buffer and then returned with the next call. If the buffer overflows, the entire contents of the buffer are discarded, the lpBytesReturned parameter contains zero, and the ReadDirectoryChangesW function fails with the error code ERROR_NOTIFY_ENUM_DIR.

    • My understanding is that this is a different buffer than the one passed as an argument (lpBuffer):

      • The former is passed to every call of ReadDirectoryChangesW (could be different buffers (with different sizes) passed for each call)

      • The latter is allocated by the system, when the former clearly is allocated (by the user) before the function call
        and that is the one that stores data (probably in some raw format) between function calls, and when the function is called, the buffer contents is copied (and formatted) to lpBuffer (if not overflew (and discarded) in the meantime)

    2. Synchronous

    Upon successful synchronous completion, the lpBuffer parameter is a formatted buffer and the number of bytes written to the buffer is available in lpBytesReturned. If the number of bytes transferred is zero, the buffer was either too large for the system to allocate or too small to provide detailed information on all the changes that occurred in the directory or subtree. In this case, you should compute the changes by enumerating the directory or subtree.

    • This somewhat confirms my previous assumption

      • "the buffer was either too large for the system to allocate" - maybe when the buffer from previous point is allocated, it takes into account nBufferLength?

    Anyway, I took your code and changed it "a bit".

    code00.py:

    import sys
    import msvcrt
    import pywintypes
    import win32file
    import win32con
    import win32api
    import win32event
    
    
    FILE_LIST_DIRECTORY = 0x0001
    FILE_ACTION_ADDED = 0x00000001
    FILE_ACTION_REMOVED = 0x00000002
    
    ASYNC_TIMEOUT = 5000
    
    BUF_SIZE = 65536
    
    
    def get_dir_handle(dir_name, asynch):
        flags_and_attributes = win32con.FILE_FLAG_BACKUP_SEMANTICS
        if asynch:
            flags_and_attributes |= win32con.FILE_FLAG_OVERLAPPED
        dir_handle = win32file.CreateFile(
            dir_name,
            FILE_LIST_DIRECTORY,
            (win32con.FILE_SHARE_READ |
             win32con.FILE_SHARE_WRITE |
             win32con.FILE_SHARE_DELETE),
            None,
            win32con.OPEN_EXISTING,
            flags_and_attributes,
            None
        )
        return dir_handle
    
    
    def read_dir_changes(dir_handle, size_or_buf, overlapped):
        return win32file.ReadDirectoryChangesW(
            dir_handle,
            size_or_buf,
            True,
            (win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
             win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
             win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
             win32con.FILE_NOTIFY_CHANGE_SIZE |
             win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
             win32con.FILE_NOTIFY_CHANGE_SECURITY),
            overlapped,
            None
        )
    
    
    def handle_results(results):
        for item in results:
            print("    {} {:d}".format(item, len(item[1])))
            _action, _ = item
            if _action == FILE_ACTION_ADDED:
                print("    found!")
            if _action == FILE_ACTION_REMOVED:
                print("    deleted!")
    
    
    def esc_pressed():
        return msvcrt.kbhit() and ord(msvcrt.getch()) == 27
    
    
    def monitor_dir_sync(dir_handle):
        idx = 0
        while True:
            print("Index: {:d}".format(idx))
            idx += 1
            results = read_dir_changes(dir_handle, BUF_SIZE, None)
            handle_results(results)
            if esc_pressed():
                break
    
    
    def monitor_dir_async(dir_handle):
        idx = 0
        buffer = win32file.AllocateReadBuffer(BUF_SIZE)
        overlapped = pywintypes.OVERLAPPED()
        overlapped.hEvent = win32event.CreateEvent(None, False, 0, None)
        while True:
            print("Index: {:d}".format(idx))
            idx += 1
            read_dir_changes(dir_handle, buffer, overlapped)
            rc = win32event.WaitForSingleObject(overlapped.hEvent, ASYNC_TIMEOUT)
            if rc == win32event.WAIT_OBJECT_0:
                bufer_size = win32file.GetOverlappedResult(dir_handle, overlapped, True)
                results = win32file.FILE_NOTIFY_INFORMATION(buffer, bufer_size)
                handle_results(results)
            elif rc == win32event.WAIT_TIMEOUT:
                #print("    timeout...")
                pass
            else:
                print("Received {:d}. Exiting".format(rc))
                break
            if esc_pressed():
                break
        win32api.CloseHandle(overlapped.hEvent)
    
    
    def monitor_dir(dir_name, asynch=False):
        dir_handle = get_dir_handle(dir_name, asynch)
        if asynch:
            monitor_dir_async(dir_handle)
        else:
            monitor_dir_sync(dir_handle)
        win32api.CloseHandle(dir_handle)
    
    
    def main():
        print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
        asynch = True
        print("Attempting {}ynchronous mode using a buffer {:d} bytes long...".format("As" if async else "S", BUF_SIZE))
        monitor_dir(".\\test", asynch=asynch)
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    • Used constants wherever possible
    • Split your code into functions so it's modular (and also to avoid duplicating it)
    • Added print statements to increase output
    • Added the asynchronous functionality (so the script doesn't hang forever if no activity in the dir)
    • Added a way to exit when user presses ESC (of course in synchronous mode an event in the dir must also occur)
    • Played with different values for different results

    Output:

    e:\Work\Dev\StackOverflow\q049799109>dir /b test
    0123456789.txt
    01234567890123456789.txt
    012345678901234567890123456789.txt
    0123456789012345678901234567890123456789.txt
    01234567890123456789012345678901234567890123456789.txt
    012345678901234567890123456789012345678901234567890123456789.txt
    0123456789012345678901234567890123456789012345678901234567890123456789.txt
    01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt
    012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt
    0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt
    
    e:\Work\Dev\StackOverflow\q049799109>
    e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py
    Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32
    
    Attempting Synchronous mode using a buffer 512 bytes long...
    Index: 0
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        deleted!
    Index: 1
        (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        deleted!
    Index: 2
        (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        deleted!
    Index: 3
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        deleted!
        (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        deleted!
    Index: 4
        (2, u'01234567890123456789012345678901234567890123456789.txt') 54
        deleted!
    Index: 5
        (2, u'0123456789012345678901234567890123456789.txt') 44
        deleted!
        (2, u'012345678901234567890123456789.txt') 34
        deleted!
    Index: 6
        (2, u'01234567890123456789.txt') 24
        deleted!
        (2, u'0123456789.txt') 14
        deleted!
    Index: 7
        (1, u'0123456789.txt') 14
        found!
    Index: 8
        (3, u'0123456789.txt') 14
    Index: 9
        (1, u'01234567890123456789.txt') 24
        found!
    Index: 10
        (3, u'01234567890123456789.txt') 24
        (1, u'012345678901234567890123456789.txt') 34
        found!
        (3, u'012345678901234567890123456789.txt') 34
        (1, u'0123456789012345678901234567890123456789.txt') 44
        found!
    Index: 11
        (3, u'0123456789012345678901234567890123456789.txt') 44
        (1, u'01234567890123456789012345678901234567890123456789.txt') 54
        found!
        (3, u'01234567890123456789012345678901234567890123456789.txt') 54
    Index: 12
    Index: 13
        (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        found!
    Index: 14
    Index: 15
        (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        found!
    Index: 16
        (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    Index: 17
        (1, u'a') 1
        found!
    Index: 18
        (3, u'a') 1
    
    e:\Work\Dev\StackOverflow\q049799109>
    e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py
    Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32
    
    Attempting Synchronous mode using a buffer 65536 bytes long...
    Index: 0
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        deleted!
    Index: 1
        (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        deleted!
    Index: 2
        (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        deleted!
    Index: 3
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        deleted!
    Index: 4
        (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        deleted!
    Index: 5
        (2, u'01234567890123456789012345678901234567890123456789.txt') 54
        deleted!
    Index: 6
        (2, u'0123456789012345678901234567890123456789.txt') 44
        deleted!
    Index: 7
        (2, u'012345678901234567890123456789.txt') 34
        deleted!
        (2, u'01234567890123456789.txt') 24
        deleted!
        (2, u'0123456789.txt') 14
        deleted!
    Index: 8
        (1, u'0123456789.txt') 14
        found!
    Index: 9
        (3, u'0123456789.txt') 14
    Index: 10
        (1, u'01234567890123456789.txt') 24
        found!
    Index: 11
        (3, u'01234567890123456789.txt') 24
    Index: 12
        (1, u'012345678901234567890123456789.txt') 34
        found!
    Index: 13
        (3, u'012345678901234567890123456789.txt') 34
    Index: 14
        (1, u'0123456789012345678901234567890123456789.txt') 44
        found!
    Index: 15
        (3, u'0123456789012345678901234567890123456789.txt') 44
    Index: 16
        (1, u'01234567890123456789012345678901234567890123456789.txt') 54
        found!
        (3, u'01234567890123456789012345678901234567890123456789.txt') 54
    Index: 17
        (1, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        found!
        (3, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        found!
    Index: 18
        (3, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        found!
        (3, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        found!
        (3, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        found!
        (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    Index: 20
        (2, u'a') 1
        deleted!
    
    e:\Work\Dev\StackOverflow\q049799109>
    e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py
    Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32
    
    Attempting Asynchronous mode using a buffer 512 bytes long...
    Index: 0
    Index: 1
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        deleted!
    Index: 2
        (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        deleted!
    Index: 3
        (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        deleted!
    Index: 4
        (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        deleted!
    Index: 5
        (2, u'01234567890123456789012345678901234567890123456789.txt') 54
        deleted!
    Index: 6
        (2, u'0123456789012345678901234567890123456789.txt') 44
        deleted!
    Index: 7
        (2, u'012345678901234567890123456789.txt') 34
        deleted!
    Index: 8
        (2, u'01234567890123456789.txt') 24
        deleted!
    Index: 9
        (2, u'0123456789.txt') 14
        deleted!
    Index: 10
    Index: 11
    Index: 12
        (1, u'0123456789.txt') 14
        found!
    Index: 13
        (1, u'01234567890123456789.txt') 24
        found!
    Index: 14
        (1, u'012345678901234567890123456789.txt') 34
        found!
    Index: 15
        (3, u'012345678901234567890123456789.txt') 34
    Index: 16
        (1, u'0123456789012345678901234567890123456789.txt') 44
        found!
        (3, u'0123456789012345678901234567890123456789.txt') 44
    Index: 17
    Index: 18
        (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        found!
    Index: 19
    Index: 20
        (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        found!
    Index: 21
    Index: 22
    Index: 23
    Index: 24
    
    e:\Work\Dev\StackOverflow\q049799109>
    e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py
    Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32
    
    Attempting Asynchronous mode using a buffer 65536 bytes long...
    Index: 0
    Index: 1
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        deleted!
    Index: 2
        (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        deleted!
    Index: 3
        (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        deleted!
    Index: 4
        (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        deleted!
    Index: 5
        (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        deleted!
    Index: 6
        (2, u'01234567890123456789012345678901234567890123456789.txt') 54
        deleted!
    Index: 7
        (2, u'0123456789012345678901234567890123456789.txt') 44
        deleted!
    Index: 8
        (2, u'012345678901234567890123456789.txt') 34
        deleted!
        (2, u'01234567890123456789.txt') 24
        deleted!
    Index: 9
        (2, u'0123456789.txt') 14
        deleted!
    Index: 10
    Index: 11
    Index: 12
        (1, u'0123456789.txt') 14
        found!
    Index: 13
        (1, u'01234567890123456789.txt') 24
        found!
    Index: 14
        (1, u'012345678901234567890123456789.txt') 34
        found!
    Index: 15
        (3, u'012345678901234567890123456789.txt') 34
        (1, u'0123456789012345678901234567890123456789.txt') 44
        found!
        (3, u'0123456789012345678901234567890123456789.txt') 44
    Index: 16
        (1, u'01234567890123456789012345678901234567890123456789.txt') 54
        found!
        (3, u'01234567890123456789012345678901234567890123456789.txt') 54
        (1, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        found!
        (3, u'012345678901234567890123456789012345678901234567890123456789.txt') 64
        (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        found!
    Index: 17
        (3, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74
        (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        found!
        (3, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84
        (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        found!
        (3, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94
        (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
        found!
        (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104
    Index: 18
    Index: 19
    

    Remarks:

    • Used a dir test containing 10 files with different names (repetitions of 0123456789)
    • There are 4 runs:
      1. Synchronous
        • 512B buffer
        • 64K buffer
      2. Asynchronous
        • 512B buffer
        • 64K buffer
    • For each (above) run, the files are (using Windows Commander to operate):
      • Moved from the dir (involved delete)
      • Moved (back) to the dir (involved add)
    • It's just one run for each combination, and that by far can't be relied on as a benchmark, but I ran the script several times and the pattern tends to be consistent
    • Deleting files doesn't vary too much across runs, which means that the events are evenly distributed over (the tiny amounts of) time
    • Adding files on the other hand is dependent on the buffer size. Another noticeable thing is that for each addition there are 2 events
    • From performance perspective, asynchronous mode doesn't bring any improvements (as I was expecting), out of contrary, it tends to slow things down. But its biggest advantage it's the possibility of gracefully exit on timeout (abnormal interrupt might keep resources locked till program exit (and sometimes even beyond!))

    Bottom line is that there's no recipe to avoid losing events. Every measure taken can be "beaten" by increasing the number of generated events.

    Minimizing the losses:

    • The buffer size. This was the (main) problem in your case. Unfortunately, the documentation couldn't be less clear, there are no guidelines on how large it should be. Browsing C forums I noticed that 64K is a common value. However:

      • It isn't possible to have a huge buffer and in case of failures to decrease its size until success, because that would mean losing all the events generated while figuring out the buffer size

      • Even if 64k is enough to hold (for several times) all the events that I generated in my tests, some were still lost. Maybe that's because of the "magical" buffer that I talked about, at the beginning

    • Reduce the number of events as much as possible. In your case I noticed that you're only interested on add and delete events (FILE_ACTION_ADDED and FILE_ACTION_REMOVED). Only specify the appropriate FILE_NOTIFY_CHANGE_* flags to ReadDirectoryChangesW (for example you don't care about FILE_ACTION_MODIFIED, but you are receiving it when adding files)

    • Try splitting the dir contents in several subdirs and monitor them concurrently. For example if you only care about changes occurred in one dir and a bunch of its subdirs, there's no point in recursively monitoring the whole tree, because it will most likely produce lots of useless events. Anyway, if doing things in parallel, don't use threads because of GIL!!! ([Python.Wiki]: GlobalInterpreterLock). Use [Python.Docs]: multiprocessing - Process-based “threading” interface instead

    • Increase the speed of the code that runs in the loop so it spends as little time as possible outside ReadDirectoryChangesW (when generated events could overflow the buffer). Of course, some of the items below might have insignificant influence and (also have some bad side effects) but I'm listing them anyway:

      • Do as less processing as possible and try to delay it. Maybe do it in another process (because of GIL)

      • Get rid of all print like statements

      • Instead of e.g. win32con.FILE_NOTIFY_CHANGE_FILE_NAME use from win32con import FILE_NOTIFY_CHANGE_FILE_NAME at the beginning of the script, and only use FILE_NOTIFY_CHANGE_FILE_NAME in the loop (to avoid variable lookup in the module)

      • Don't use functions (because of call / ret like instructions) - not sure about that

      • Try using win32file.GetQueuedCompletionStatus method to get the results (async only)

      • Since in time, things tend to get better (there are exceptions, of course), try switching to a newer Python version. Maybe it will run faster

      • Use C - this is probably undesirable, but it could have some benefits:

        • There won't be the back and forth conversions between Python and C that PyWin32 performs - but I didn't use a profiler to check how much time is spent in them

        • lpCompletionRoutine (that PyWin32 doesn't offer) would be available too, maybe it's faster

        • As an alternative, C could be invoked using CTypes, but that would require some work and I feel that it won't worth