Search code examples
f#deedle

Cross join with Deedle


I'm trying to learn some F# and Deedle by analyzing my electricity costs.

Suppose I have two frames, one containing my electricity usage:

let consumptionsByYear = 
  [ (2019, "Total", 500); (2019, "Day", 200); (2019, "Night", 300);
    (2020, "Total", 600); (2020, "Day", 250); (2020, "Night", 350) ]
  |> Frame.ofValues
        Total Day Night 
2019 -> 500   200 300   
2020 -> 600   250 350   

The other contains two plans with different pricing structure (either a flat fee or fee varying based on the time of the day):

let prices = 
  [ ("Plan A", "Base fee", 50); ("Plan A", "Fixed price", 3); ("Plan A", "Day price", 0); ("Plan A", "Night price", 0);
    ("Plan B", "Base fee", 40); ("Plan B", "Fixed price", 0); ("Plan B", "Day price", 5); ("Plan B", "Night price", 2) ]
  |> Frame.ofValues
          Base fee Fixed price Day price Night price 
Plan A -> 50       3           0         0           
Plan B -> 40       0           5         2           

Previously I have solved this in SQL using a cross join and in Excel using nested joins. To copy those, I found Frame.mapRows, but constructing the expected output seems very tedious using it:

let costs = consumptionsByYear
            |> Frame.mapRows (fun _year cols ->
                ["Total price" => (prices?``Base fee``
                    + (prices?``Fixed price`` |> Series.mapValues ((*) (cols.GetAs<float>("Total"))))
                    + (prices?``Day price`` |> Series.mapValues ((*) (cols.GetAs<float>("Day"))))
                    + (prices?``Night price`` |> Series.mapValues ((*) (cols.GetAs<float>("Night"))))
                    )]
                |> Frame.ofColumns)
            |> Frame.unnest
               Total price 
2019 Plan A -> 1550        
     Plan B -> 1640        
2020 Plan A -> 1850        
     Plan B -> 1990        

Is there a better way or even small improvements?


Solution

  • I'm not a Deedle expert, but I think this is basically:

    • A dot product of two matrices: consumptionsByYear and the periodic day/night prices,
    • Followed by the addition of the constant base prices.

    In other words:

     consumptionsByYear      periodicPrices               basePrices
     -------------------     ------------------------     ---------------------------
    |         Day Night |   |          Plan A Plan B |   |             Plan A Plan B |
    | 2019 -> 200 300   | * | Day   -> 3      5      | + | Base fee -> 50     40     |
    | 2020 -> 250 350   |   | Night -> 3      2      |    ---------------------------
     -------------------     ------------------------
    

    With that approach in mind, here's how I would do it:

    open Deedle
    open Deedle.Math
    
    let consumptionsByYear = 
        [ (2019, "Day", 200); (2019, "Night", 300)
          (2020, "Day", 250); (2020, "Night", 350) ]
        |> Frame.ofValues
    
    let basePrices =
        [ ("Plan A", "Base fee", 50)
          ("Plan B", "Base fee", 40) ]
        |> Frame.ofValues
        |> Frame.transpose
    
    let periodicPrices =
        [ ("Plan A", "Day", 3); ("Plan A", "Night", 3)
          ("Plan B", "Day", 5); ("Plan B", "Night", 2) ]
        |> Frame.ofValues
        |> Frame.transpose
    
    // repeat the base prices for each year
    let basePricesExpanded =
        let row = basePrices.Rows.["Base fee"]
        consumptionsByYear
            |> Frame.mapRowValues (fun _ -> row)
            |> Frame.ofRows
    
    let result =
        Matrix.dot(consumptionsByYear, periodicPrices) + basePricesExpanded
    result.Print()
    

    Output is:

            Plan A Plan B
    2019 -> 1550   1640
    2020 -> 1850   1990
    

    A few changes I made for simplicity:

    • consumptionsByYear
      • I mapped the years from integers to strings in order to make the matrices compatible.
      • I removed the Total column, since it can be derived from the other two.
    • prices
      • I broke this into two separate frames: one for the periodic prices and another for the base prices, and then transposed them to enable matrix multiplication.
      • I changed Day price to Day and Night price to Night to make the matrices compatible.
      • I got rid of the Fixed price column, since it can be represented in the Day and Night columns.

    Update: As of Deedle 2.4.2, it is no longer necessary to map the years to strings. I've modified my solution accordingly.