Search code examples
c#dataframemissing-datadeedle

How to forward fill missing values in a C# data frame


I am trying to forward fill values in a Deedle C# data frame, like the equivalent of using the pandas.ffill() function in python. The forward fill in pandas, allows for forward filling at both row and column indexes. I want do the same in Deedle, to take the nearest non missing value on the same row but from a different column, so across rows, which is very simple in python and pandas. I'm new to both C# and the Deedle library, and all I can find in the documentation is how to forward fill values within a series not across a data frame. Any help is greatly appreciated, I'm new to this, and I can't seem to find any substantial examples for this library anywhere.

This is an example of my data frame and I am trying to forward fill the values across the rows. So, the filled value for the missing value in col3 would become the last value in same row of col2 etc.

-------- rawDF ------
col1 col2 col3
AAA  BBB    
CCC  DDD  EEE
FFF     
BBB  AAA    
DDD  CCC    
EEE  FFF
AAA     
BBB  CCC    
AAA     
FFF  AAA    
DDD 

The examples which are provided in the Deedle documentation for both C# and F#, only fill the values within a series as shown. If I attempt to FillMissing(); across the data frame I get nothing returned at all.

//Fill with previous available value in the series
var fillFwd = col2.FillMissing(Direction.Forward);
fillFwd.Print();

//Fill with the next available value
var fillBwd = col2.FillMissing(Direction.Backward);
fillBwd.Print()

---- nothing happens with the following ----

//forward fill all values in the DF
rawDF.FillMissing(Direction.Forward);
rawDF.Print();

//backward fill all values in the DF

//fill values in the DF with a constant value
rawDF.FillMissing(0);
rawDF.Print();

Solution

  • First of all, your second group of examples where you call rawDf.FillMissing and then print rawDF do not do anything because Deedle data frames are (mostly) immutable. Operations like FillMissing return a new (filled) data frame and do not modify the original one. So, the right way of using the operations is as in your first group of examples where you assing the result to a new variable and then print that.

    Second problem I encountered was that when you read your sample data from (say) a CSV file, the empty strings in the file will not be treated as missing values but as valid (empty) string values. Deedle can treat those as missing values if you use Select and turn empty strings to null:

    var df = Frame.ReadCsv("C:/temp/aa.csv");
    var dfEmpty = df.SelectValues((string s) => (s == "") ? null : s);
    

    Now you should see something happening when you use the FillMissing operation as you did:

    var fillFwd = dfEmpty.FillMissing(Direction.Backward);
    fillFwd.Print();
    

    As you correctly pointed out, this does not do what you want - it fills values from top to bottom. You wanted to fill data from left to right. One way to do this is to use Select which lets you transform each row individually:

    var fillRight = Frame.FromRows(dfEmpty.Rows.Select(row => 
      row.Value.FillMissing(Direction.Forward)));
    fillRight.Print();
    

    You could also transpose the data frame, then fill the missing values and then transpose it back (which is quite nice way of expressing what you need, but trnasposition might be a bit more time consuming than just using Select):

    var fillRight = dfEmpty.Transpose().FillMissing(Direction.Forward).Transpose()