Search code examples
f#deedle

Finding the Stats.max for each hierarchical index in Deedle


If I have a dataset that contains [City, Dealership, Total Cars Sold]. How would I get the top dealer in each city and the number of cars they sold?

The results should look like

City1 Dealership A 2000
City2 Dealership X 1000
etc.

I'm sure it's possible, but I'm not having any luck and it might because i'm approaching the problem the wrong way.

Currently i'm grouping by Dealership and City which creates a Frame<(string*string*int), int> and that gets me

City1 Dealership A 1 -> 2000
City1 Dealership B 2 -> 1000
City2 Dealership X 3 -> 1000
City2 Dealership Y 4 -> 500
etc.

But trying to then get the dealership that does the most deals is where i'm stumped.

Thanks.


Solution

  • I adapted Tomas's answer and output the type as Series<string, (string * int)>

    let data = series [
      ("City1", "Dealership A") => 2000
      ("City1", "Dealership B") => 1000
      ("City2", "Dealership X") => 1000
      ("City2", "Dealership Y") => 500 ]
    
    data
    |> Series.groupBy (fun k _ -> fst k)
    |> Series.mapValues (fun sr ->
      let sorted = sr |> Series.sortBy(fun x -> -x)
      let key = sorted |> Series.firstKey |> snd
      let value = sorted |> Series.firstValue
      key, value )
    

    The output looks like

    City1 -> (Dealership A, 2000) 
    City2 -> (Dealership X, 1000) 
    

    EDITED

    I assume you have a csv file like this

    City,Dealership,TotalCarsSold
    City1,Dealership A,2000
    City1,Dealership B,1000
    City2,Dealership X,1000
    City2,Dealership Y,500
    

    This is how I'll do it. Read it as Frame and get the column as Series and apply the same code above to get result.

    let df = 
      Frame.ReadCsv("C:/Temp/dealership.csv")
      |> Frame.indexRowsUsing(fun r -> r.GetAs<string>("City"), r.GetAs<string>("Dealership"))
    df?TotalCarsSold
    |> Series.groupBy (fun k _ -> fst k)
    |> Series.mapValues (fun sr ->
      let sorted = sr |> Series.sortBy(fun x -> -x)
      let key = sorted |> Series.firstKey |> snd
      let value = sorted |> Series.firstValue
      key, value )