Search code examples
f#deedle

Deedle missing values after grouping


I have two frames, each of which contains some IDs and zero to many measures for each ID. I want to get the average measure per ID for each frame and combine to a larger frame.

The problem is that when an ID does not appear in one of the two frames, after grouping it results in a missing value in the combined frame. Here is an example. Notice ID "Chris" does not appear in frame A.

let aF = frame [ "AID" =?> Series.ofValues [ "Andrew"; "Andrew"; "Andrew"]; "AMES" =?> Series.ofValues [  2; 4; 3]]
let bF = frame [ "BID" =?> Series.ofValues [ "Andrew"; "Chris"; "Andrew"];  "BMES" =?> Series.ofValues [ 1; 6; 7]]
let groupF = frame [ "AG" => (aF |> Frame.groupRowsByString "AID" |> Frame.getCol "AMES") ; "BG" => (bF |> Frame.groupRowsByString "BID" |> Frame.getCol "BMES") ]
let groupFMean = groupF |> Frame.getNumericCols |> Series.mapValues (Stats.levelMean fst) |> Frame.ofColumns |> Frame.fillMissingWith 0
groupFMean.SaveCsv( "tgroupFMean.csv", includeRowKeys=true, keyNames=["Id"] )

The resulting table looks like this:

Id      AG  BG
Andrew  3   4
Chris   6

And the blank cell is "". I've tried variations with fillMissingWith 0 (at series and and frame level) without success.


Solution

  • The answer is not very obvious - the problem is that fillMissingWith only touches columns that have the same type as the value you are using to fill the data - so for example, fillMissingWith "Unknown" would only fill missing values in columns that are string.

    In your case, Frame.fillMissingWith 0 is only applied to columns of type int and there are no such columns. If you use Frame.fillMissingWith 0.0, things work as expected!

    PS: If you have any thoughts on how this could be done better, please let us know. I'm really not sure what the right behavior is here!