I would like to remove duplicate entries from a queue in an efficient way. The queue has a custom class with DateTime and FullPath and a few other things
private Queue<MyCustomClass> SharedQueue;
The DateTime in the class is the timestamp when inserted into the queue. The logic I would like to use is as following: Remove duplicates from the queue if the FullPath is identical within a 4 second window (i.e. if added to queue within 4 seconds of a duplicate fullpath). I have the events that I want to watch but a few duplicates will still arrive and that is OK.
I am using c# 2.0 and the FileSystemWatcher class and a worker queue.
There are a bunch of ways to do this: Trim the queue each time an item is added to it, or when I am working on the queue skip the processing of the current duplicate item.
Or should I use a 'global private' variable Dictionary< String, DateTime> ? So I can quickly search it? or a local copy of the queue ? Perhaps it is best to limit the local queue to 100 items in case of many file events? Though in my case it 'should be' only a relatively few files to monitor in a folder... but things always change...
Thanks for any help.
:Edit: Feb 10 8:54 EST: So I decided to implement a good simple solution as far as I can tell. I don't think I am holding on to the Dict keys too long...
:Edit: Feb 10 9:53 EST: Updated as my Dictionary cannot contain duplicate values.
public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
// start the worker thread when program starts.
// call Terminate.Set() in the programs exit routine or close handler etc.
{
// lock shared queue
lock (SharedQueue)
{
if (!IsDuplicateQueueInput(rcd)) // only add unique values to queue
{
SharedQueue.Enqueue(rcd);
SomethingToDo.Set();
}
}
} // public void QueueInput
private bool IsDuplicateQueueInput(HotSynchUnit.RcdFSWFile rcd)
/* Return true if the object is a duplicate object.
* Pseudo Code:
*
* isDuplicate = false
* Lock Dictionary
* -If lastTimeStamp > 4 seconds ago then // Optimization: save lastTimeStamp
* if Dict.Count > 0 then clear Dictionary
* return isDuplicate
* -If not Dict.TryGetValue(sPath, dtTimeStamp) then
* Dict.AddKey()
* -Else
* Compare key timestamp to Currenttime
* if key timestamp is <= 4 seconds ago then
* IsDuplicate = True
*
* Dict.RemoveKey()
* Dict.AddKey()
*
* return isDuplicate
*/
{
// put real code here
}
I just thought about using any collection similar to a generic hashtable... Something like this:
Dictionary<string, YourClass> dict = new Dictionary<string, YourClass>();
/// just let's assume you want to add/check for "c:\demo.txt"
if (!dict.ContainsKey(@"c:\demo.txt"))
{
/// add items to dict by passing fullPath as key and your objects as value
dict.add(@"c:\demo.txt", obj1);
}
else if (dict[@"c:\demo.txt"].CheckForIntervall())
{
/// replace current object in dictionary with new object - in case you want to..
/// or just do what you want to
}
edit - your custom class may have some functionality like this:
class YOURCUSTOMCLASS
{
private DateTime creationTime;
public DateTime CreationTime
{ get { return creationTime; } }
public YOURCUSTOMCLASS(parametersGoesHere xyz)
{
creationTime = DateTime.Now;
}
/// in this case this method will return true
/// if the timeSpan between this object and otherObject
/// is greater than 4 seconds
public bool CheckForInterval(YOURCUSTOMCLASS otherObject)
{
TimeSpan diff = otherObj.CreationTime.Subtract(creationTime);
/// you may replace 4 through any other digit, or even better take
/// a const/global var/static ...
return diff.TotalSeconds > 4;
}
/// all the other stuff you need ...
}
Of course you will loose the functionality of a queue - but you will get an massive increase in runtime if your queue containts many elements.
hth