I have two frames, each of which contains some IDs and zero to many measures for each ID. I want to get the average measure per ID for each frame and combine to a larger frame.
The problem is that when an ID does not appear in one of the two frames, after grouping it results in a missing value in the combined frame. Here is an example. Notice ID "Chris" does not appear in frame A.
let aF = frame [ "AID" =?> Series.ofValues [ "Andrew"; "Andrew"; "Andrew"]; "AMES" =?> Series.ofValues [ 2; 4; 3]]
let bF = frame [ "BID" =?> Series.ofValues [ "Andrew"; "Chris"; "Andrew"]; "BMES" =?> Series.ofValues [ 1; 6; 7]]
let groupF = frame [ "AG" => (aF |> Frame.groupRowsByString "AID" |> Frame.getCol "AMES") ; "BG" => (bF |> Frame.groupRowsByString "BID" |> Frame.getCol "BMES") ]
let groupFMean = groupF |> Frame.getNumericCols |> Series.mapValues (Stats.levelMean fst) |> Frame.ofColumns |> Frame.fillMissingWith 0
groupFMean.SaveCsv( "tgroupFMean.csv", includeRowKeys=true, keyNames=["Id"] )
The resulting table looks like this:
Id AG BG
Andrew 3 4
Chris 6
And the blank cell is "". I've tried variations with fillMissingWith 0 (at series and and frame level) without success.
The answer is not very obvious - the problem is that fillMissingWith
only touches columns that have the same type as the value you are using to fill the data - so for example, fillMissingWith "Unknown"
would only fill missing values in columns that are string.
In your case, Frame.fillMissingWith 0
is only applied to columns of type int
and there are no such columns. If you use Frame.fillMissingWith 0.0
, things work as expected!
PS: If you have any thoughts on how this could be done better, please let us know. I'm really not sure what the right behavior is here!