Search code examples
c#filterfunctional-programmingdata-analysisdeedle

How to filter or drop a value based on the previous one using Deedle in C#?


I am dealing with data from sensors. Sometimes these sensors have blackouts and brownouts, in consequence I can have the following kind of Time Series in a Frame, let's call it "myData":

[7.438984; 0,000002; 7.512345; 0.000000; 7.634912; 0.005123; 7.845627...]

Because I need only 3 decimals precision, I rounded the data from the frame:

var myRoundedData = myData.ColumnApply((Series<DateTime, double> numbers) => numbers.Select(kvp => Math.Round(kvp.Value, 3)));

I get the columns from the frame and filtered the Zeros "0.000":

var myFilteredTimeSeries = from kvp in myTimeSeries where kvp.Value != 0.000 select kvp;

So, my Time Series is partially filtered: [7.439; 7.512; 7.635; 0.006; 7.846...]

However, the value "0.006" is not valid!

How could I implement an elegant filtering syntax based on the previous value, something like a "percent limit" in the rate of change:

if (0.006 / 7.635) * 100 < 0.1 then ---> drop / delete(0.006)


Solution

  • One of the keys is to stay focused in methods that involve the value and its "neighbourhood", just like @tomaspetricek pointed before (Thanks!). My goal was to find a "free-of-noise" time stamp or keys to build a Frame and perform an AddColumn operation, which is by nature a JoinKind.Left operation.

    To solve the problem I used the Pairwise() method to get focused on "Item1" (current value), and "Item2" (next value) as follows:

    double filterSensibility = 5.0 // % percentage
    
    var myBooleanFilteredTimeSeries = myTimeSeries.Pairwise().Select(kvp => (kvp.Value.Item2 / kvp.Value.Item1) * 100 < filterSensibility);
    

    Here I could write the relation I wanted! (see question) Then based on the Time Series (example) posted before I got:

    myBooleanFilteredTimeSeries = [FALSE; FALSE; FALSE, TRUE; FALSE...]

    TRUE means that this value is noisy! So I get only the FALSE boolean values with:

     var myDateKeysModel = from kvp in myBooleanFilteredTimeSeries where kvp.Value == false select kvp;
    

    I created a frame from this last Time Series:

    myCleanDateTimeKeysFrame = Frame.FromRecords(myDateKeysModel);
    

    Finally, I add the original (noisy) Time Series to the previously created Frame:

    myCleanDateTimeKeysFrame.AddColumn("Column Title", myOrginalTimeSeries);
    

    ...et voilà!

    enter image description here