Search code examples
dataframef#deedle

F# Deedle and Multi Index


I have recently started to learn F# for Data Science (coming from simple C# and Python). I start to get used to the power of functional first paradigm for Science.

However, I am still confused on how to treat a problem I could easily fix using pandas in Python. It is related to Multi index time series / Data frame. I have extensively checked on Deedle but I am still not sure if Deedle could help me achieve such a table:

Column Index 1:           A       ||         B

Column Index 2:    A1        A2   ||    B1       B2

Column Index 3:  p1  p2 |  p1  p2 || p1  p2 | p1  p2

Row Index:
date1           0.5  2. |  2. 0.5 || 3.  0. | 2.   3.

date2          ......

The idea being able to sum all p1 series when Index1 = A etc etc

I did not find example of such a thing using Deedle.

If it is not available, what structure for my data would you recommend me?

Thanks for helping a newbie (but in love with) in F#


Solution

  • In Deedle, you can create a frame or a series with hierarchical index by using a tuple as the key:

    let ts = 
      series
       [ ("A", "A1", "p1") => 0.5 
         ("A", "A1", "p2") => 2.
         ("A", "A2", "p3") => 2. 
         ("A", "A2", "p4") => 0.5 ]
    

    Deedle does have some special handling for this. For example, it will output the data as:

    A A1 p1 -> 0.5 
         p2 -> 2   
      A2 p3 -> 2   
         p4 -> 0.5 
    

    To apply aggregation over a part of the hierarchy, you can use the applyLevel function:

    ts |> Series.applyLevel (fun (l1, l2, l3) -> l1) Stats.mean
    ts |> Series.applyLevel (fun (l1, l2, l3) -> l1, l2) Stats.mean
    

    The first argument is a function that gets the tuple of keys and selects what part of the level you want to group - so the above two create an aggregation over the top and top two levels, respectively.