Search code examples
c#datatable

c# Datatable ( deleting Duplicate records within 60 seconds of each other , keeping only the first )


I have a C# Datatable with three fields. Where I want to only keep the first records that were entered more than 60 seconds within the other similar records after that.

Similar being defined as:

Additional to the above condition the ID numbers and billing code must be the same to be excluded in the final table.

so my example below would be the input

|IdNumber(string) |EntryDate(DateTime)|BillingCode(Int)|

| 5678 | 2021/06/16 08:46:37 | 56 |

| 5678 | 2021/06/16 08:46:54 | 56 |

| 5678 | 2021/06/16 08:47:16 | 56 |

| 5678 | 2021/06/16 08:47:54 | 56 |

| 5678 | 2021/06/16 08:47:59 | 57 |

| 5691 | 2021/06/16 08:47:59 | 57 |

| 5694 | 2021/06/16 08:48:00 | 57 |

And the resultant output should look like

|IdNumber(string) |EntryDate(DateTime)|BillingCode(Int)|

| 5678 | 2021/06/16 08:46:37 | 56 |

| 5678 | 2021/06/16 08:47:54 | 56 |

| 5678 | 2021/06/16 08:47:59 | 57 |

| 5691 | 2021/06/16 08:47:59 | 57 |

| 5694 | 2021/06/16 08:48:00 | 57 |

I have struggled with this for the past day and cannot wrap my head around a solution.

Any suggestions either Linq or Datatable select queries will do?

Johan


Solution

  • Simple Linq Query would be:

    var toDelete =
        stuff.GroupBy(x => new { x.IdNumber, x.BillingCode })
            .SelectMany(g =>
                g.OrderBy(x => x.EntryDate)
                .Aggregate(
                    new { toKeep = new List<Thing>(), toDelete = new List<Thing>() },
                    (acc, record) =>
                    {
                        var shouldDeleteRecord =
                            acc.toKeep.Any(x => x.EntryDate.AddSeconds(60) > record.EntryDate);
    
                        if (shouldDeleteRecord)
                        {
                            acc.toDelete.Add(record);
                        } 
                        else
                        {
                            acc.toKeep.Add(record);
                        }
    
                        return acc;
                    }
                ).toDelete
            );
    
    // todo: delete your stuff here
    

    The idea being, group them by IdNumber and BillingCode, then order them by Entry Date. Loop through each item in each ordered group, and keep the first one that's outside of the 60seconds boundary imposed by the last one. Delete the rest.

    Would be much cleaner, and easier to maintain if you wrote it as loops instead of using just Linq though.

    EDIT: sorry, forgot to say, I'm using Thing as a stand in for whatever your data is called:

    class Thing
    {
        public DateTime EntryDate { get; set; }
        public string IdNumber { get; set; }
        public int BillingCode { get; set; }
    }
    

    and stuff is an IEnumerable<Thing> (standin for your table).