Search code examples
c#xmllinqduplicatesxelement

XML node Latest timestamp C#


I need to only print the same employee once and check if its the latest timestamp. I need employee id only once but the latest timestamp of it.

This is my result so far, printing same id with different timestamps (example):

1 2015-03-16T21:32:30
1 2015-03-16T21:33:30
2 2015-03-16T21:32:30
3 2015-03-16T21:32:30
2 2015-03-16T21:33:30

This is what my code looks like right now.

  static void Main(string[] args)
        {
            XElement xelement = XElement.Load("data.xml");
            IEnumerable<XElement> employees = xelement.Elements();
            Console.WriteLine("List of Employee and latest timestamp:");
            foreach (var employee in employees)
            {
                Console.WriteLine("{0} has Employee ID {1}",
                    employee.Element("Employee").Value,
                    employee.Element("ChangeTimeStamp").Value);
            }
            Console.ReadKey();
        }

To be clear i would like my result to be :

1 2015-03-16T21:33:30
2 2015-03-16T21:33:30
3 2015-03-16T21:32:30

Solution

  • You need to group your employee nodes by the value of Employee element like this:

    XElement xelement = XElement.Load("data.xml");
    var employees = xelement.Elements()
                .Select(e => new 
                {
                    Name = e.Element("Employee").Value,
                    ChangeTimestamp = DateTime.Parse(e.Element("ChangeTimestamp").Value)
                })
                .GroupBy(e => e.Name)
                .Select( g => new 
                {
                    Name = g.Key,
                    ChangeTimestamp = g.Max(e => e.ChangeTimestamp)
                });
    

    Now, when you iterate you should have the proper values

    foreach(var employee in employees)
    {
        Console.WriteLine("{0} {1}", employee.Name, employee.ChangeTimestamp);
    }
    

    Edit I

    To remove the other elements except the ones with the last version you should filter them and remove each one from their parent. The code should look something like this:

    XElement xelement = XElement.Load("data.xml");
    xelement.Elements()
        .GroupBy(e => e.Element("Employee").Value)
        .SelectMany(g =>
        {
            var maxDate = g.Max( e => DateTime.Parse(e.Element("ChangeTimestamp").Value));
            return g.Where(e => DateTime.Parse(e.Element("ChangeTimestamp").Value) != maxDate);
        })
        .ToList()
        .ForEach(e=>e.Remove());
    

    This, of course is not the optimal solution but it gives you a hint in the right direction. Adjust it to fit your needs.

    Edit II

    The file from the paste bin has 3 different employees and not 3 different ChangeTimestamps for one single employee. I've tried the following snippet and it worked:

    XDocument doc = XDocument.Load(@"path");
    doc.Root.Elements("Employee")
        .GroupBy(e => e.Element("Employee").Value)
        .SelectMany(g=>
        {
            return g.OrderByDescending(e => DateTime.Parse(e.Element("ChangeTimeStamp").Value))
                .Skip(1);
        })
        .ToList()
        .ForEach(e=>e.Remove());