Search code examples
f#mapreduceterminologyfold

F#: What to call a combination of map and fold, or of map and reduce?


A simple example, inspired by this question:

module SimpleExample =
    let fooFold projection folder state source =
        source |> List.map projection |> List.fold folder state
    // val fooFold :
    //   projection:('a -> 'b) ->
    //     folder:('c -> 'b -> 'c) -> state:'c -> source:'a list -> 'c

    let fooReduce projection reducer source =
        source |> List.map projection |> List.reduce reducer
    // val fooReduce :
    //   projection:('a -> 'b) -> reducer:('b -> 'b -> 'b) -> source:'a list -> 'b

    let game = [0, 5; 10, 15]
    let minX, maxX = fooReduce fst min game, fooReduce fst max game
    let minY, maxY = fooReduce snd min game, fooReduce snd max game

What would be a natural name for the functions fooFold and fooReduce in this example? Alas, mapFold and mapReduce are already taken.

mapFold is part of the F# library and does a fold operation over the input to return a tuple of 'result list * 'state, similar to scan, but without the initial state and the need to provide the tuple as part of the state yourself. Its signature is:

val mapFold : ('State -> 'T -> 'Result * 'State) -> 'State -> 'T list -> 'Result list * 'State

Since the projection can easily be integrated into the folder, the fooFold function is only included for illustration purposes.

And MapReduce:

MapReduce is an algorithm for processing huge datasets on certain kinds of distributable problems using a large number of nodes


Now for a more complex example, where the fold/reduce is not directly applied to the input, but to the groupings following a selection of the keys. The example has been borrowed from a Python library, where it is called - perhaps misleadingly - reduceby.

module ComplexExample =
    let fooFold keySelection folder state source =
        source |> Seq.groupBy keySelection 
        |> Seq.map (fun (k, xs) ->
            k, Seq.fold folder state xs) 
    // val fooFold :
    //   keySelection:('a -> 'b) ->
    //     folder:('c -> 'a -> 'c) -> state:'c -> source:seq<'a> -> seq<'b * 'c>
    //     when 'b : equality

    let fooReduce keySelection projection reducer source =
        source |> Seq.groupBy keySelection 
        |> Seq.map (fun (k, xs) ->
            k, xs |> Seq.map projection |> Seq.reduce reducer) 
    // val fooReduce :
    //   keySelection:('a -> 'b) ->
    //     projection:('a -> 'c) ->
    //     reducer:('c -> 'c -> 'c) -> source:seq<'a> -> seq<'b * 'c>
    //     when 'b : equality

    type Project = { name : string; state : string; cost : decimal }
    let projects =
        [ { name = "build roads";  state = "CA"; cost = 1000000M }
          { name = "fight crime";  state = "IL"; cost = 100000M  }
          { name = "help farmers"; state = "IL"; cost = 2000000M }
          { name = "help farmers"; state = "CA"; cost = 200000M  } ]
    fooFold (fun x -> x.state) (fun acc x -> acc + x.cost) 0M projects
    // val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]

    fooReduce (fun x -> x.state) (fun x -> x.cost) (+) projects
    // val it : seq<string * decimal> = seq [("CA", 1200000M); ("IL", 2100000M)]

What would be the natural name for the functions fooFold and fooReduce here?


Solution

  • I'd probably call the first two mapAndFold and mapAndReduce (though I agree that mapFold and mapReduce would be good names if they were not already taken). Alternatively, I'd go with mapThenFold (etc.), which is perhaps more explicit, but it reads a bit cumbersome.

    For the more complex ones, reduceBy and foldBy sound good. The issue is that this would not work if you also wanted a version of those functions that do not do the mapping operation. If you wanted that, you'd probably need mapAndFoldBy and mapAndReduceBy (as well as just foldBy and reduceBy). This gets a bit ugly, but I'm afraid that's the best you can do.

    More generally, the issue when comparing names with Python is that Python allows overloading whereas F# functions do not. This means that you need to have a unique name for functions that would have multiple overloads. This means that you just need to come up with a consistent naming scheme that will not make the names unbearably long.

    (I experienced this when coming up with names for the functions in the Deedle library, which is somewhat inspired by Pandas. You can see for example the aggregation functions in Deedle for an example - there is a pattern in the naming to deal with the fact that each function needs a unique name.)