I am trying to forward fill values in a Deedle C# data frame, like the equivalent of using the pandas.ffill() function in python. The forward fill in pandas, allows for forward filling at both row and column indexes. I want do the same in Deedle, to take the nearest non missing value on the same row but from a different column, so across rows, which is very simple in python and pandas. I'm new to both C# and the Deedle library, and all I can find in the documentation is how to forward fill values within a series not across a data frame. Any help is greatly appreciated, I'm new to this, and I can't seem to find any substantial examples for this library anywhere.
This is an example of my data frame and I am trying to forward fill the values across the rows. So, the filled value for the missing value in col3 would become the last value in same row of col2 etc.
-------- rawDF ------
col1 col2 col3
AAA BBB
CCC DDD EEE
FFF
BBB AAA
DDD CCC
EEE FFF
AAA
BBB CCC
AAA
FFF AAA
DDD
The examples which are provided in the Deedle documentation for both C# and F#, only fill the values within a series as shown. If I attempt to FillMissing(); across the data frame I get nothing returned at all.
//Fill with previous available value in the series
var fillFwd = col2.FillMissing(Direction.Forward);
fillFwd.Print();
//Fill with the next available value
var fillBwd = col2.FillMissing(Direction.Backward);
fillBwd.Print()
---- nothing happens with the following ----
//forward fill all values in the DF
rawDF.FillMissing(Direction.Forward);
rawDF.Print();
//backward fill all values in the DF
//fill values in the DF with a constant value
rawDF.FillMissing(0);
rawDF.Print();
First of all, your second group of examples where you call rawDf.FillMissing
and then print rawDF
do not do anything because Deedle data frames are (mostly) immutable. Operations like FillMissing
return a new (filled) data frame and do not modify the original one. So, the right way of using the operations is as in your first group of examples where you assing the result to a new variable and then print that.
Second problem I encountered was that when you read your sample data from (say) a CSV file, the empty strings in the file will not be treated as missing values but as valid (empty) string values. Deedle can treat those as missing values if you use Select
and turn empty strings to null
:
var df = Frame.ReadCsv("C:/temp/aa.csv");
var dfEmpty = df.SelectValues((string s) => (s == "") ? null : s);
Now you should see something happening when you use the FillMissing
operation as you did:
var fillFwd = dfEmpty.FillMissing(Direction.Backward);
fillFwd.Print();
As you correctly pointed out, this does not do what you want - it fills values from top to bottom. You wanted to fill data from left to right. One way to do this is to use Select
which lets you transform each row individually:
var fillRight = Frame.FromRows(dfEmpty.Rows.Select(row =>
row.Value.FillMissing(Direction.Forward)));
fillRight.Print();
You could also transpose the data frame, then fill the missing values and then transpose it back (which is quite nice way of expressing what you need, but trnasposition might be a bit more time consuming than just using Select
):
var fillRight = dfEmpty.Transpose().FillMissing(Direction.Forward).Transpose()