Search code examples
c#linq

Filter a list with intermittent consecutive duplicate values based on timestamp


I need the values alternating between 1 and 0. I need to filter out and keep value with highest timestamp if the values are repeated.

For the following list

timestamp value
08:10 0
08:15 0
08:20 1
08:25 0
08:30 1
08:35 1
08:50 1
08:55 0

I need the filtered list to look like this:

timestamp value
08:15 0
08:20 1
08:25 0
08:50 1
08:55 0

I used a for loop, but I'm looking for a more efficient solution.

class DataPoint
{
    public TimeSpan Timestamp { get; set; }
    public int Value { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        List<DataPoint> dataPoints = new List<DataPoint>
        {
            new DataPoint { Timestamp = TimeSpan.Parse("08:10"), Value = 0 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:15"), Value = 0 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:20"), Value = 1 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:25"), Value = 0 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:30"), Value = 1 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:35"), Value = 1 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:50"), Value = 1 },
            new DataPoint { Timestamp = TimeSpan.Parse("08:55"), Value = 0 }
        };

        var filteredData = new List<DataPoint>();

        for (int i = 0; i < dataPoints.Count; i++)
        {
            var currentDataPoint = dataPoints[i];
            var segmentEndIndex = i;

            while (segmentEndIndex < dataPoints.Count - 1 && dataPoints[segmentEndIndex + 1].Value == currentDataPoint.Value)
            {
                segmentEndIndex++;
            }

            var maxTimestampDataPoint = dataPoints
                .Skip(i)
                .Take(segmentEndIndex - i + 1)
                .OrderByDescending(d => d.Timestamp)
                .First();
            filteredData.Add(maxTimestampDataPoint);

            i = segmentEndIndex;
        }

        foreach (var dataPoint in filteredData)
        {
            Console.WriteLine($"Timestamp: {dataPoint.Timestamp}, Value: {dataPoint.Value}");
        }
    }
}

Solution

  • I had posted my solution before I saw your edit saying that you already had a working solution with a for loop and you were looking for a more efficient solution. But since you have not posted your solution, it is impossible to judge whether my solution is more efficient.

    Also, you didn't specify what you would like to improve: speed, memory consumption, code quality?


    My solution assumes that list is ordered by timestamp in ascending order. If not sort the list by the timestamp first.

    For a quick test a made a list of tuples (uses a collection expression introduced in C#12):

    List<(TimeOnly timestamp, int value)> list = [
        (new(08, 10), 0),
        (new(08, 15), 0),
        (new(08, 20), 1),
        (new(08, 25), 0),
        (new(08, 30), 1),
        (new(08, 35), 1),
        (new(08, 50), 1),
        (new(08, 55), 0),
    ];
    

    My algorithm creates a result list called filtered by adding or updating elements depending on whether we have a transition between 0/1 or not.

    List<(TimeOnly timestamp, int value)> filtered = new();
    filtered.Add(list[0]);
    for (int i = 1; i < list.Count; i++) {
        var item = list[i];
        if (item.value == filtered[^1].value) {
            filtered[^1] = item; // Same value: update with higher timestamp.
        } else {
            filtered.Add(item); // 0 -> 1 or 1 -> 0 transition: add item.
        }
    }
    

    Note that filtered[^1] denotes the last item in the list. It uses C#'s Indices and ranges feature.


    Test output (Console application):

    foreach (var item in filtered) {
        Console.WriteLine(item);
    }
    

    Produces:

    (08:15, 0)
    (08:20, 1)
    (08:25, 0)
    (08:50, 1)
    (08:55, 0)
    

    Having seen your code, I can say that my approach is much more efficient. You have a nested loop within the outer loop, then you have LINQ Skip and Take statements which are using loops internally. Skip starts looping at the beginning of the list every time. Together with the outer loop, this gives a O(n²) time complexity, compared to only O(n) for my approach. What this means is that if, for example, you have a list which is 10 times larger, then your algorithm will be 10∙10 = 100 times slower, while mine will be only 10 times slower.

    You also have a OrderByDescending having O(n∙log(n)) time complexity. You could simply have replaced OrderByDescending and First by .Last() having only O(n) (the input list must be ordered anyway otherwise your requirement doesn't make sense).

    See: Big O notation (Wikipedia)