Search code examples
f#deedle

f# deedle aggregate row values base on column key


say I have the following Frame,

type Person = 
   { Name:string; Age:int; Comp1:float; Comp2:float }

let peopleRecds = 
    [ { Name = "Joe"; Age = 51; Comp1=12.1; Comp2 =20.3 }
      { Name = "Tomas"; Age = 28; Comp1=1.1; Comp2 =29.3 }
      { Name = "Eve"; Age = 2; Comp1=2.1; Comp2 =40.3 }
      { Name = "Suzanne"; Age = 15; Comp1=12.4; Comp2 =26.3} ]
let peopleList = Frame.ofRecords peopleRecds

What I would like to do is to sum Comp1 and Comp2 columns into a new column for peopleList, and the n for Comp(n) is undecided at the moment, so I cant just to know only sum two columns, there might be Comp3, Comp4, so have to base on regular expression, something like key is leading with Comp.

It seems what I should do is to mapRowValues on each row

   peopleList?TotalComp <- peopleList |>Frame.mapRowValues(
                                          fun row -> 
                                              (do something to sum up)
                                         )

However I am not sure how to operate at the row level here.


Solution

  • If it is planned to increase the number of 'Comp' , it might be better to put them in an array:

    type nPerson =  { Name:string; Age:int; Comp:float[] }
    
    let npeopleRecds = 
        [ { Name = "Joe"; Age = 51; Comp = [| 12.1; 20.3 |] }
          { Name = "Tomas"; Age = 28; Comp = [| 1.1; 29.3  |] }
          { Name = "Eve"; Age = 2; Comp = [| 2.1; 40.3  |] }
          { Name = "Suzanne"; Age = 15; Comp = [|12.4; 26.3 |] } ]
    

    Then you can easily add a column with the sum of:

    let npeopleList = Frame.ofRecords npeopleRecds
    
    npeopleList.Format() |> printfn "%s"
    
    let sumseries = npeopleList.GetColumn<float []>("Comp") |> Series.mapValues(fun x -> x |> Array.sum)
    
    npeopleList?TotalComp <- sumseries
    
    npeopleList.Format() |> printfn "%s"
    

    Print:

         Name    Age Comp
    0 -> Joe     51  System.Double[]
    1 -> Tomas   28  System.Double[]
    2 -> Eve     2   System.Double[]
    3 -> Suzanne 15  System.Double[]
    
         Name    Age Comp            TotalComp
    0 -> Joe     51  System.Double[] 32,4
    1 -> Tomas   28  System.Double[] 30,4
    2 -> Eve     2   System.Double[] 42,4
    3 -> Suzanne 15  System.Double[] 38,7
    

    Edit:

    If the change the fields in the Person record is unacceptable - you can use filters:

    let allSum = 
        peopleList.Columns
        |> Series.filter(fun k _ -> k.StartsWith("Comp"))
        |> Frame.ofColumns
        |> Frame.rows
        |> Series.mapValues(Series.foldValues(fun acc v -> acc + (v :?> float)) 0.0)
    
    
    peopleList?TotalComp <- allSum
    
    peopleList.Format() |> printfn "%s"
    

    Print:

         Name    Age Comp1 Comp2 TotalComp
    0 -> Joe     51  12,1  20,3  32,4
    1 -> Tomas   28  1,1   29,3  30,4
    2 -> Eve     2   2,1   40,3  42,4
    3 -> Suzanne 15  12,4  26,3  38,7