Search code examples
f#deedle

How to join frames using F#'s Deedle where one of the frame has a composite key?


Say I have two frames, firstFrame (Frame<(int * int),string>) and secondFrame (Frame<int,string>). I'd like to find a way to join the frames such that the values from the first part of the composite key from firstFrame match the values from the key in secondFrame.

The following is an example of the frames that I'm working with:

val firstFrame : Deedle.Frame<(int * int),string> =

       Premia 
1 1 -> 125    
2 1 -> 135    
3 1 -> 169    
1 2 -> 231    
2 2 -> 876    
3 2 -> 24     

val secondFrame : Deedle.Frame<int,string> =

     year month 
1 -> 2014 Apr   
2 -> 2014 May   
3 -> 2014 Jun   

Code used to generate the sample above:

#I @"w:\\\dev\packages\Deedle.0.9.12"
#load "Deedle.fsx"
open Deedle
open System

let periodMembers =[(1,1);(2,1);(3,1);(1,2);(2,2);(3,2);]
let premia =[125;135;169;231;876;24;]
let firstSeries = Series(periodMembers,premia)
let firstFrame = Frame.ofColumns["Premia"=>firstSeries]

let projectedYears = series([1=>2014;2=>2014;3=>2014;])
let projectedMonths = series([1=>"Apr";2=>"May";3=>"Jun"])
let secondFrame = Frame(["year";"month"],[projectedYears;projectedMonths;])

Solution

  • Great question! This is not as easy as it should be (and it is probably related to another question about joining frames that we recorded as an issue). I think there should be a nicer way to do this and I'll add a link to this question to the issue.

    That said, you can use the fact that joining can align keys that do not match exactly to do this. You can first add zero as the second element of the key in the second frame:

    > let m = secondFrame |> Frame.mapRowKeys (fun k -> k, 0);;
    
    val m : Frame<(int * int),string> =      
           year month 
    1 0 -> 2014 Apr   
    2 0 -> 2014 May   
    3 0 -> 2014 Jun   
    

    Now, the key in the second frame is always smaller than the matching keys in the first frame (assuming the numbers are positive). So, e.g. we want to align a value n the second frame with key (1, 0) to values in the first frame with keys (1, 1), (1, 2), ... You can use Lookup.NearestSmaller to tell Deedle that you want to find a value with the nearest smaller key (which will be (1, 0) for any key (1, k)).

    To use this, you first need to sort the first frame, but then it works nicely:

    > firstFrame.SortByRowKey().Join(m, JoinKind.Left, Lookup.NearestSmaller);;
    
    val it : Frame<(int * int),string> =      
           Premia year month 
    1 1 -> 125    2014 Apr   
      2 -> 231    2014 Apr   
    2 1 -> 135    2014 May   
      2 -> 876    2014 May   
    3 1 -> 169    2014 Jun   
      2 -> 24     2014 Jun   
    

    This is not particularly obvious, but it does the trick. Although, I hope we can come up with a nicer way!