Good morning,
I've come across a peculiar problem with a program I'm creating in Python. It appears that when I drag and drop files from one location to another, not all of the files are registered as events by the modules.
I've been working with win32file and win32con to try an get all events related to moving files from one location to another for processing.
Here is a snip bit of my detection code:
import win32file
import win32con
def main():
path_to_watch = 'D:\\'
_file_list_dir = 1
# Create a watcher handle
_h_dir = win32file.CreateFile(
path_to_watch,
_file_list_dir,
win32con.FILE_SHARE_READ |
win32con.FILE_SHARE_WRITE |
win32con.FILE_SHARE_DELETE,
None,
win32con.OPEN_EXISTING,
win32con.FILE_FLAG_BACKUP_SEMANTICS,
None
)
while 1:
results = win32file.ReadDirectoryChangesW(
_h_dir,
1024,
True,
win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
win32con.FILE_NOTIFY_CHANGE_SIZE |
win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
win32con.FILE_NOTIFY_CHANGE_SECURITY,
None,
None
)
for _action, _file in results:
if _action == 1:
print 'found!'
if _action == 2:
print 'deleted!'
I dragged and dropped 7 files and it only found 4.
# found!
# found!
# found!
# found!
What can I do to detect all dropped files?
[ActiveState.Docs]: win32file.ReadDirectoryChangesW (this is the best documentation that I could find for [GitHub]: mhammond/pywin32 - Python for Windows (pywin32) Extensions) is a wrapper over [MS.Docs]: ReadDirectoryChangesW function. Here's what it states (about the buffer):
When you first call ReadDirectoryChangesW, the system allocates a buffer to store change information. This buffer is associated with the directory handle until it is closed and its size does not change during its lifetime. Directory changes that occur between calls to this function are added to the buffer and then returned with the next call. If the buffer overflows, the entire contents of the buffer are discarded, the lpBytesReturned parameter contains zero, and the ReadDirectoryChangesW function fails with the error code ERROR_NOTIFY_ENUM_DIR.
My understanding is that this is a different buffer than the one passed as an argument (lpBuffer):
The former is passed to every call of ReadDirectoryChangesW (could be different buffers (with different sizes) passed for each call)
The latter is allocated by the system, when the former clearly is allocated (by the user) before the function call
and that is the one that stores data (probably in some raw format) between function calls, and when the function is called, the buffer contents is copied (and formatted) to lpBuffer (if not overflew (and discarded) in the meantime)
Upon successful synchronous completion, the lpBuffer parameter is a formatted buffer and the number of bytes written to the buffer is available in lpBytesReturned. If the number of bytes transferred is zero, the buffer was either too large for the system to allocate or too small to provide detailed information on all the changes that occurred in the directory or subtree. In this case, you should compute the changes by enumerating the directory or subtree.
This somewhat confirms my previous assumption
Anyway, I took your code and changed it "a bit".
code00.py:
import sys
import msvcrt
import pywintypes
import win32file
import win32con
import win32api
import win32event
FILE_LIST_DIRECTORY = 0x0001
FILE_ACTION_ADDED = 0x00000001
FILE_ACTION_REMOVED = 0x00000002
ASYNC_TIMEOUT = 5000
BUF_SIZE = 65536
def get_dir_handle(dir_name, asynch):
flags_and_attributes = win32con.FILE_FLAG_BACKUP_SEMANTICS
if asynch:
flags_and_attributes |= win32con.FILE_FLAG_OVERLAPPED
dir_handle = win32file.CreateFile(
dir_name,
FILE_LIST_DIRECTORY,
(win32con.FILE_SHARE_READ |
win32con.FILE_SHARE_WRITE |
win32con.FILE_SHARE_DELETE),
None,
win32con.OPEN_EXISTING,
flags_and_attributes,
None
)
return dir_handle
def read_dir_changes(dir_handle, size_or_buf, overlapped):
return win32file.ReadDirectoryChangesW(
dir_handle,
size_or_buf,
True,
(win32con.FILE_NOTIFY_CHANGE_FILE_NAME |
win32con.FILE_NOTIFY_CHANGE_DIR_NAME |
win32con.FILE_NOTIFY_CHANGE_ATTRIBUTES |
win32con.FILE_NOTIFY_CHANGE_SIZE |
win32con.FILE_NOTIFY_CHANGE_LAST_WRITE |
win32con.FILE_NOTIFY_CHANGE_SECURITY),
overlapped,
None
)
def handle_results(results):
for item in results:
print(" {} {:d}".format(item, len(item[1])))
_action, _ = item
if _action == FILE_ACTION_ADDED:
print(" found!")
if _action == FILE_ACTION_REMOVED:
print(" deleted!")
def esc_pressed():
return msvcrt.kbhit() and ord(msvcrt.getch()) == 27
def monitor_dir_sync(dir_handle):
idx = 0
while True:
print("Index: {:d}".format(idx))
idx += 1
results = read_dir_changes(dir_handle, BUF_SIZE, None)
handle_results(results)
if esc_pressed():
break
def monitor_dir_async(dir_handle):
idx = 0
buffer = win32file.AllocateReadBuffer(BUF_SIZE)
overlapped = pywintypes.OVERLAPPED()
overlapped.hEvent = win32event.CreateEvent(None, False, 0, None)
while True:
print("Index: {:d}".format(idx))
idx += 1
read_dir_changes(dir_handle, buffer, overlapped)
rc = win32event.WaitForSingleObject(overlapped.hEvent, ASYNC_TIMEOUT)
if rc == win32event.WAIT_OBJECT_0:
bufer_size = win32file.GetOverlappedResult(dir_handle, overlapped, True)
results = win32file.FILE_NOTIFY_INFORMATION(buffer, bufer_size)
handle_results(results)
elif rc == win32event.WAIT_TIMEOUT:
#print(" timeout...")
pass
else:
print("Received {:d}. Exiting".format(rc))
break
if esc_pressed():
break
win32api.CloseHandle(overlapped.hEvent)
def monitor_dir(dir_name, asynch=False):
dir_handle = get_dir_handle(dir_name, asynch)
if asynch:
monitor_dir_async(dir_handle)
else:
monitor_dir_sync(dir_handle)
win32api.CloseHandle(dir_handle)
def main():
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
asynch = True
print("Attempting {}ynchronous mode using a buffer {:d} bytes long...".format("As" if async else "S", BUF_SIZE))
monitor_dir(".\\test", asynch=asynch)
if __name__ == "__main__":
main()
Notes:
Output:
e:\Work\Dev\StackOverflow\q049799109>dir /b test 0123456789.txt 01234567890123456789.txt 012345678901234567890123456789.txt 0123456789012345678901234567890123456789.txt 01234567890123456789012345678901234567890123456789.txt 012345678901234567890123456789012345678901234567890123456789.txt 0123456789012345678901234567890123456789012345678901234567890123456789.txt 01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt 012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt 0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt e:\Work\Dev\StackOverflow\q049799109> e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py Python 2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32 Attempting Synchronous mode using a buffer 512 bytes long... Index: 0 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 deleted! Index: 1 (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 deleted! Index: 2 (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 deleted! Index: 3 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 deleted! (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 deleted! Index: 4 (2, u'01234567890123456789012345678901234567890123456789.txt') 54 deleted! Index: 5 (2, u'0123456789012345678901234567890123456789.txt') 44 deleted! (2, u'012345678901234567890123456789.txt') 34 deleted! Index: 6 (2, u'01234567890123456789.txt') 24 deleted! (2, u'0123456789.txt') 14 deleted! Index: 7 (1, u'0123456789.txt') 14 found! Index: 8 (3, u'0123456789.txt') 14 Index: 9 (1, u'01234567890123456789.txt') 24 found! Index: 10 (3, u'01234567890123456789.txt') 24 (1, u'012345678901234567890123456789.txt') 34 found! (3, u'012345678901234567890123456789.txt') 34 (1, u'0123456789012345678901234567890123456789.txt') 44 found! Index: 11 (3, u'0123456789012345678901234567890123456789.txt') 44 (1, u'01234567890123456789012345678901234567890123456789.txt') 54 found! (3, u'01234567890123456789012345678901234567890123456789.txt') 54 Index: 12 Index: 13 (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 found! Index: 14 Index: 15 (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 found! Index: 16 (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 Index: 17 (1, u'a') 1 found! Index: 18 (3, u'a') 1 e:\Work\Dev\StackOverflow\q049799109> e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py Python 2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32 Attempting Synchronous mode using a buffer 65536 bytes long... Index: 0 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 deleted! Index: 1 (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 deleted! Index: 2 (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 deleted! Index: 3 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 deleted! Index: 4 (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 deleted! Index: 5 (2, u'01234567890123456789012345678901234567890123456789.txt') 54 deleted! Index: 6 (2, u'0123456789012345678901234567890123456789.txt') 44 deleted! Index: 7 (2, u'012345678901234567890123456789.txt') 34 deleted! (2, u'01234567890123456789.txt') 24 deleted! (2, u'0123456789.txt') 14 deleted! Index: 8 (1, u'0123456789.txt') 14 found! Index: 9 (3, u'0123456789.txt') 14 Index: 10 (1, u'01234567890123456789.txt') 24 found! Index: 11 (3, u'01234567890123456789.txt') 24 Index: 12 (1, u'012345678901234567890123456789.txt') 34 found! Index: 13 (3, u'012345678901234567890123456789.txt') 34 Index: 14 (1, u'0123456789012345678901234567890123456789.txt') 44 found! Index: 15 (3, u'0123456789012345678901234567890123456789.txt') 44 Index: 16 (1, u'01234567890123456789012345678901234567890123456789.txt') 54 found! (3, u'01234567890123456789012345678901234567890123456789.txt') 54 Index: 17 (1, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 found! (3, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 found! Index: 18 (3, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 found! (3, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 found! (3, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 found! (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 Index: 20 (2, u'a') 1 deleted! e:\Work\Dev\StackOverflow\q049799109> e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py Python 2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32 Attempting Asynchronous mode using a buffer 512 bytes long... Index: 0 Index: 1 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 deleted! Index: 2 (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 deleted! Index: 3 (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 deleted! Index: 4 (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 deleted! Index: 5 (2, u'01234567890123456789012345678901234567890123456789.txt') 54 deleted! Index: 6 (2, u'0123456789012345678901234567890123456789.txt') 44 deleted! Index: 7 (2, u'012345678901234567890123456789.txt') 34 deleted! Index: 8 (2, u'01234567890123456789.txt') 24 deleted! Index: 9 (2, u'0123456789.txt') 14 deleted! Index: 10 Index: 11 Index: 12 (1, u'0123456789.txt') 14 found! Index: 13 (1, u'01234567890123456789.txt') 24 found! Index: 14 (1, u'012345678901234567890123456789.txt') 34 found! Index: 15 (3, u'012345678901234567890123456789.txt') 34 Index: 16 (1, u'0123456789012345678901234567890123456789.txt') 44 found! (3, u'0123456789012345678901234567890123456789.txt') 44 Index: 17 Index: 18 (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 found! Index: 19 Index: 20 (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 found! Index: 21 Index: 22 Index: 23 Index: 24 e:\Work\Dev\StackOverflow\q049799109> e:\Work\Dev\StackOverflow\q049799109>"C:\Install\x64\HPE\OPSWpython\2.7.10__00\python.exe" code00.py Python 2.7.10 (default, Mar 8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32 Attempting Asynchronous mode using a buffer 65536 bytes long... Index: 0 Index: 1 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 deleted! Index: 2 (2, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 deleted! Index: 3 (2, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 deleted! Index: 4 (2, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 deleted! Index: 5 (2, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 deleted! Index: 6 (2, u'01234567890123456789012345678901234567890123456789.txt') 54 deleted! Index: 7 (2, u'0123456789012345678901234567890123456789.txt') 44 deleted! Index: 8 (2, u'012345678901234567890123456789.txt') 34 deleted! (2, u'01234567890123456789.txt') 24 deleted! Index: 9 (2, u'0123456789.txt') 14 deleted! Index: 10 Index: 11 Index: 12 (1, u'0123456789.txt') 14 found! Index: 13 (1, u'01234567890123456789.txt') 24 found! Index: 14 (1, u'012345678901234567890123456789.txt') 34 found! Index: 15 (3, u'012345678901234567890123456789.txt') 34 (1, u'0123456789012345678901234567890123456789.txt') 44 found! (3, u'0123456789012345678901234567890123456789.txt') 44 Index: 16 (1, u'01234567890123456789012345678901234567890123456789.txt') 54 found! (3, u'01234567890123456789012345678901234567890123456789.txt') 54 (1, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 found! (3, u'012345678901234567890123456789012345678901234567890123456789.txt') 64 (1, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 found! Index: 17 (3, u'0123456789012345678901234567890123456789012345678901234567890123456789.txt') 74 (1, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 found! (3, u'01234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 84 (1, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 found! (3, u'012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 94 (1, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 found! (3, u'0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789.txt') 104 Index: 18 Index: 19
Remarks:
Bottom line is that there's no recipe to avoid losing events. Every measure taken can be "beaten" by increasing the number of generated events.
Minimizing the losses:
The buffer size. This was the (main) problem in your case. Unfortunately, the documentation couldn't be less clear, there are no guidelines on how large it should be. Browsing C forums I noticed that 64K is a common value. However:
It isn't possible to have a huge buffer and in case of failures to decrease its size until success, because that would mean losing all the events generated while figuring out the buffer size
Even if 64k is enough to hold (for several times) all the events that I generated in my tests, some were still lost. Maybe that's because of the "magical" buffer that I talked about, at the beginning
Reduce the number of events as much as possible. In your case I noticed that you're only interested on add and delete events (FILE_ACTION_ADDED and FILE_ACTION_REMOVED). Only specify the appropriate FILE_NOTIFY_CHANGE_* flags to ReadDirectoryChangesW (for example you don't care about FILE_ACTION_MODIFIED, but you are receiving it when adding files)
Try splitting the dir contents in several subdirs and monitor them concurrently. For example if you only care about changes occurred in one dir and a bunch of its subdirs, there's no point in recursively monitoring the whole tree, because it will most likely produce lots of useless events. Anyway, if doing things in parallel, don't use threads because of GIL!!! ([Python.Wiki]: GlobalInterpreterLock). Use [Python.Docs]: multiprocessing - Process-based “threading” interface instead
Increase the speed of the code that runs in the loop so it spends as little time as possible outside ReadDirectoryChangesW (when generated events could overflow the buffer). Of course, some of the items below might have insignificant influence and (also have some bad side effects) but I'm listing them anyway:
Do as less processing as possible and try to delay it. Maybe do it in another process (because of GIL)
Get rid of all print like statements
Instead of e.g. win32con.FILE_NOTIFY_CHANGE_FILE_NAME use from win32con import FILE_NOTIFY_CHANGE_FILE_NAME at the beginning of the script, and only use FILE_NOTIFY_CHANGE_FILE_NAME in the loop (to avoid variable lookup in the module)
Don't use functions (because of call / ret like instructions) - not sure about that
Try using win32file.GetQueuedCompletionStatus method to get the results (async only)
Since in time, things tend to get better (there are exceptions, of course), try switching to a newer Python version. Maybe it will run faster
Use C - this is probably undesirable, but it could have some benefits:
There won't be the back and forth conversions between Python and C that PyWin32 performs - but I didn't use a profiler to check how much time is spent in them
lpCompletionRoutine (that PyWin32 doesn't offer) would be available too, maybe it's faster
As an alternative, C could be invoked using CTypes, but that would require some work and I feel that it won't worth