I have a sequence of pairs (key, value) like
[("a", 1), ("a", 2), ("a", 111), ("b", 3), ("bb", 1), ("bb", -1), ...]
, what is the most effective way to convert it into sequence like
[("a", [1,2,111]), ("b", [3]), ("bb", [1,-1])]
or similar?
The sequence has following property: it's really big (>2Gb)
This makes Seq.groupBy really ineffective and incorrect, are there any other ways to do it?
P.S.: this sequence:
[("a", 1), ("a", 2), ("a", 111), ("bb", 1), ("bb", -1), ("a", 5), ("a", 6), ...]
should be converted as
[("a", [1,2,111]), ("bb", [1,-1]), ("a", [5,6]), ...]
--
edit #1: Fixed incorrect sample
edit #2: Sequence is big, so lazy (or fastest) solution is preferred
If you want the option to get lazy results, then I don't think there's an elegant way without maintaining mutable state. Here's a relatively straight-forward one with mutation. You maintain a store of the last key you saw, and all the values that correspond to that:
let s = [("a", 1); ("a", 2); ("a", 111); ("bb", 1); ("bb", -1); ("a", 5); ("a", 6)]
let s2 =
[
let mutable prevKey = None
let mutable values = System.Collections.Generic.List<_>()
let init key value =
prevKey <- Some key
values.Clear()
values.Add value
for (key, value) in s do
match prevKey with
| None -> init key value
| Some k when k = key -> values.Add value
| Some k ->
yield (k, List.ofSeq values)
init key value
match prevKey with
| Some k -> yield (k, List.ofSeq values)
| _ -> ()
]
This gives:
val s2 : (string * int list) list =
[("a", [1; 2; 111]); ("bb", [1; -1]); ("a", [5; 6])]
For lazy evaluation, replace the [ ... ]
with seq { ... }