I am using a FileSystemWatcher
to monitor a folder and check if a new folder appears. I then have to copy some of the files from there someplace else. But I have to first wait for the folder to be copied. This is the code.
bool waiting = true;
var watcher = new FileSystemWatcher(path);
watcher.Created += (obj, args) =>
{
//do something
waiting = false;
};
watcher.NotifyFilter = NotifyFilters.DirectoryName;
watcher.EnableRaisingEvents = true;
while(waiting)
{
}
The problem is as soon as the folder is created I get notified and the "do something" part happens even though the folder isn't fully copied yet and obviously I get into problems. I have to somehow wait for the folder to fully copy before the "do something" part. How can I do that?
That's a common problem faced by all file syncing applications like Dropbox, OneDrive etc. Copying a large file involves one creation event and multiple Changed
events because the file can't be created in a single operation. There's no Closed
event so applications can only wait until the changed
events stop before they start hashing and syncing again.
In fact, you'll notice that when you copy a lot of files into a folder monitored by Dropbox et al they stop what they were doing and wait for a bit after copying stops.
Reactive Extensions in .NET, Java, Javascript and other languages allow the use of LINQ-like queries over streams of events. One of the available operators is Debounce which waits until a stream of events has quieted down before emitting the last one. This operator (called Throttle in .NET) can be used to detect when file creations have stopped.
This example waits 5 seconds after the last file creation before calling the subscriber method :
using (var fsw = new FileSystemWatcher(@"K:\Backups"))
{
fsw.InternalBufferSize = 65536;
var creations = Observable.FromEventPattern<FileSystemEventHandler, FileSystemEventArgs>(
h => fsw.Created += h,
h => fsw.Created -= h);
creations.Timestamp()
.Throttle(TimeSpan.FromSeconds(5))
.Select(x => $"{ x.Timestamp} : {DateTime.Now - x.Timestamp} - {x.Value.EventArgs.FullPath}")
.Subscribe(Console.WriteLine);
}
Timestamp
is used to add a Timestamp
property to each event, to demonstrate the time difference between file creation and execution of the subscriber.
By returning only the last event, this single Throttle()
can be used to signal processing of the entire folder. To handle individual files, we need to throttle the event streams generated by each file separately. In other words, to group events by file :
var obs = from creation in creations
group creation by creation.EventArgs.FullPath into g
from last in g.Throttle(TimeSpan.FromSeconds(5))
select last.EventArgs.FullPath;
obs.Subscribe(Console.WriteLine);
The LINQ query syntax is a lot easier in this case. group by
groups events by file name and then Throttle()
emits the last event per file after 5 seconds of quiet.
To make this work with large files, we'd need to combine both Created and Changed events. That's the job of the Merge operator :
var changes = Observable.FromEventPattern<FileSystemEventHandler, FileSystemEventArgs>(
h => fsw.Changed += h,
h => fsw.Changed -= h)
var obs = from evt in creations.Merge(changes)
group evt by evt.EventArgs.FullPath into g
from last in g.Throttle(TimeSpan.FromSeconds(5))
select $"{last.EventArgs.ChangeType} - {last.EventArgs.FullPath}";
And that's where things go BOOM!
Copying on Windows 10 raises only two Change events, the last one only when copying is finished. If the file is too large (GBs or 100s of MBs, depending on disk speed), the second event may take too long to arrive.
One option would be to set a large Timespan, large enough to cover most IO operations, eg TimeSpan.FromMinutes(1)
.
Another option would be to use another operator, Buffer(), which can capture a specified number of items in a batch and return them as an array :
var obs = from evt in creations.Merge(changes)
group evt by evt.EventArgs.FullPath into g
from last in g.Buffer(3)
select $"{last[2].EventArgs.ChangeType} - {last[2].EventArgs.FullPath}";
This only works when copying though. Saving a file eg from Excel or Word may result in multiple Changed
events as the application makes multiple changes to the file.
Buffer can also take a Timespan argument, which could be used to gather all Changed events per file and check them to see whether they fit one of the two patterns. Multiple changes? Just a start/end Changed event? When did the last event occur (that's provided by Timestamp) ?