Search code examples
f#time-seriesdeedle

Deedle: grouping time series in top 3 and rest


I have a Deedle series with election data like:

   "Party A", 304
   "Party B", 25 
   "Party C", 570
   ....
   "Party Y", 2
   "Party Z", 258

I'd like to create a new series like this:

   "Party C", 570
   "Party A", 304 
   "Party Z", 258
   "Others", 145

So I want to take the top 3 as they are and sum all others as a new row. What is the best way to do this?


Solution

  • I don't think we have anything in Deedle that would make this a one-liner (how disappointing...). So the best I could think of is to get the keys for the top 3 parties and then use Series.groupInto with a key selector that returns either the party name (for the top 3) or returns "Other" (for the other parties):

    // Sample data set with a bunch of parties
    let election =
     [ "Party A", 304
       "Party B", 25 
       "Party C", 570
       "Party Y", 2
       "Party Z", 258 ]
     |> series
    
    // Sort the data by -1 times the value (descending)
    let byVotes = election |> Series.sortBy (~-)
    // Create a set with top 3 keys (for efficient lookup)
    let top3 = byVotes |> Series.take 3 |> Series.keys |> set
    
    // Group the series using key selector that tries to find the party in top3
    // and using an aggregation function that sums the values (for one or multiple values)
    byVotes |> Series.groupInto 
        (fun k v -> if top3.Contains(k) then k else "Other")
        (fun k s -> s |> Series.mapValues float |> Stats.sum)