Search code examples
c#queuefilesystemwatcherno-duplicates

How to remove duplicate items from a queue within a time frame?


I would like to remove duplicate entries from a queue in an efficient way. The queue has a custom class with DateTime and FullPath and a few other things

private Queue<MyCustomClass> SharedQueue;

The DateTime in the class is the timestamp when inserted into the queue. The logic I would like to use is as following: Remove duplicates from the queue if the FullPath is identical within a 4 second window (i.e. if added to queue within 4 seconds of a duplicate fullpath). I have the events that I want to watch but a few duplicates will still arrive and that is OK.

I am using c# 2.0 and the FileSystemWatcher class and a worker queue.

There are a bunch of ways to do this: Trim the queue each time an item is added to it, or when I am working on the queue skip the processing of the current duplicate item.

Or should I use a 'global private' variable Dictionary< String, DateTime> ? So I can quickly search it? or a local copy of the queue ? Perhaps it is best to limit the local queue to 100 items in case of many file events? Though in my case it 'should be' only a relatively few files to monitor in a folder... but things always change...

Thanks for any help.

:Edit: Feb 10 8:54 EST: So I decided to implement a good simple solution as far as I can tell. I don't think I am holding on to the Dict keys too long...

:Edit: Feb 10 9:53 EST: Updated as my Dictionary cannot contain duplicate values.

   public void QueueInput(HotSynchUnit.RcdFSWFile rcd)
// start the worker thread when program starts.
// call Terminate.Set() in the programs exit routine or close handler etc.
{
  // lock shared queue
  lock (SharedQueue)
  {
    if (!IsDuplicateQueueInput(rcd))  // only add unique values to queue
    {
      SharedQueue.Enqueue(rcd);
      SomethingToDo.Set();
    }
  }
} // public void QueueInput

private bool IsDuplicateQueueInput(HotSynchUnit.RcdFSWFile rcd)
/* Return true if the object is a duplicate object.
 * Pseudo Code:
 * 
 * isDuplicate = false
 * Lock Dictionary
 * -If lastTimeStamp > 4 seconds ago then       // Optimization: save lastTimeStamp
 *    if Dict.Count > 0 then clear Dictionary
 *    return isDuplicate
 * -If not Dict.TryGetValue(sPath, dtTimeStamp) then
 *    Dict.AddKey()
 * -Else
 *    Compare key timestamp to Currenttime
 *    if key timestamp is <= 4 seconds ago then
 *       IsDuplicate = True
 *
 *    Dict.RemoveKey()
 *    Dict.AddKey()
 * 
 * return isDuplicate
*/
{
  // put real code here
}

Solution

  • I just thought about using any collection similar to a generic hashtable... Something like this:

    Dictionary<string, YourClass> dict = new Dictionary<string, YourClass>();
    
    /// just let's assume you want to add/check for "c:\demo.txt"
    
    if (!dict.ContainsKey(@"c:\demo.txt"))
    {
       /// add items to dict by passing fullPath as key and your objects as value
       dict.add(@"c:\demo.txt", obj1);
    } 
    else if (dict[@"c:\demo.txt"].CheckForIntervall())
    {
       /// replace current object in dictionary with new object - in case you want to..
       /// or just do what you want to 
    }
    

    edit - your custom class may have some functionality like this:

    class YOURCUSTOMCLASS
    {
        private DateTime creationTime;
    
        public DateTime CreationTime
        { get { return creationTime; } }
    
        public YOURCUSTOMCLASS(parametersGoesHere xyz)
        {
              creationTime = DateTime.Now;
        }
    
        /// in this case this method will return true
        /// if the timeSpan between this object and otherObject
        /// is greater than 4 seconds
        public bool CheckForInterval(YOURCUSTOMCLASS otherObject)
        {
             TimeSpan diff = otherObj.CreationTime.Subtract(creationTime);
    
             /// you may replace 4 through any other digit, or even better take
             /// a const/global var/static ...
             return diff.TotalSeconds > 4;
        }
    
        /// all the other stuff you need ...
    }
    

    Of course you will loose the functionality of a queue - but you will get an massive increase in runtime if your queue containts many elements.

    hth