Search code examples
f#aggregate

F# iterate sequence of objects and conditionally aggregate based on property


I was able to do this exercise in C# but I'm having trouble replicating this in F#. I've got a Sequence of the following TransactionFs type:

type TransactionFs(Debitor:string, Activity:string, Spend:float, Creditor:string) =
        member this.Debitor = Debitor
        member this.Activity = Activity
        member this.Spend = Spend
        member this.Creditor = Creditor

Sequence:

[
     FSI_0003+TransactionFs { Activity = "someActivity1";
                               Creditor = "alessio";
                               Debitor = "luca";
                               Spend = 10.0; };
     FSI_0003+TransactionFs {  Activity = "someActivity2";
                               Creditor = "alessio";
                               Debitor = "giulia";
                               Spend = 12.0; };
     FSI_0003+TransactionFs {  Activity = "someActivity3";
                               Creditor = "luca";
                               Debitor = "alessio";
                               Spend = 7.0; };
]

I'm trying to get a sequence of TransactionFs with the following rule. For each transaction, check the Debitor and Creditor; look in the sequence for all corresponding transactions where Debitor and Creditor are swapped and return a single TransactionFs with a Spend property that is the total debt due from the holder of the largest Spend (subtracting or summing the Spend appropriately). This Spend will represent the total debt due from the Debitor to the Creditor.

For example, the result for the Creditor and Debitor couple alessio and luca should be:

TransactionFs { Activity = "_aggregate_";
                Creditor = "alessio";
                Debitor = "luca";
                Spend = 3.0; };

Of course one way of doing this would be using nested for loops, but since I'm learning F# I would like to know what would be a proper functional way of doing this.


Solution

  • As a first step, I'd probably use Seq.groupBy to group the items into units with the same pair of people as creditor or debitor in either order. That way you end up with a list of lists of transactions, but it's all done in a single O(N) step. I.e.,

    let grouped = transactions |> Seq.groupBy (fun t ->
        let c, d = t.Creditor, t.Debitor
        if c < d then c, d else d, c
    )
    

    Now you have a sequence that looks roughly like this (in a pseudo-code mix of code and English):

    [
        (("alessio", "luca"), [luca gave alessio 10; alessio gave luca 7])
        (("alessio", "giulia"), [alessio gave giulia 12])
    ]
    

    The output of Seq.groupBy is a sequence of 2-tuples; the format of each 2-tuple is (group, items). Here, the group itself is a 2-tuple of (name1, name2), so the nested structure of the data is ((name1, name2), transactions).

    Now for each list of transations, you'll want to add up the sum, with some transactions being considered "positive" and some "negative" depending on whether they are the same as the (name1, name2) order or the reverse. I.e. in the first list of transactions, ones where Alessio paid Luca would be considered positive, and ones where Luca paid Alessio would be considered negative. Add up all those values, and if the difference is positive, then the debitor-creditor relationship is "name1 owes money to name2", otherwise it's the reverse. E.g.:

    let result = grouped |> Seq.map (fun ((name1, name2), transactions) ->
        let spendTotal = transactions |> Seq.sumBy (fun t ->
            let mult = if t.Debitor = name1 then +1.0 else -1.0
            t.Spend * mult
        )
        let c, d = if spendTotal > 0.0 then name1, name2 else name2, name1
        { Activity = "_aggregate_"
          Creditor = c
          Debitor = d
          Spend = spendTotal }
    )   
    

    Now your sequence looks something like:

    [
        (("alessio", "luca"), luca gave alessio 3 net)
        (("alessio", "giulia"), alessio gave giulia 12 net)
    ]
    

    Now we want to throw away the group names (the (name1, name2) pairs), and take just the second part of each tuple in the sequence. (Remember that the overarching structure of the sequence is (group, transactions). F# has a convenience function called snd for taking the second item of a 2-tuple. So the next step in the chain is simply:

    let finalResult = result |> Seq.map snd
    

    Putting all the pieces together, the code would look like this when arranged in a single pipeline without intermediate steps:

    let finalResult =
        transactions
        |> Seq.groupBy (fun t ->
            let c, d = t.Creditor, t.Debitor
            if c < d then c, d else d, c )
        |> Seq.map (fun ((name1, name2), transactions) ->
            let spendTotal = transactions |> Seq.sumBy (fun t ->
                let mult = if t.Debitor = name1 then +1.0 else -1.0
                t.Spend * mult
            )
            let c, d = if spendTotal > 0.0 then name2, name1 else name1, name2
            { Activity = "_aggregate_"
              Creditor = c
              Debitor = d
              Spend = spendTotal }
       |> Seq.map snd
    

    NOTE: Since you asked for "a proper functional way of doing this", I've written this using F# record syntax for your data objects. F# records provide a lot of useful functionality-by-default that you don't get with classes, like having comparison and hashcode functions already written for you. Plus records are immutable once created, so you never have to worry about concurrency in a multithreaded environment: if you have a reference to a record, no other code is going to change it out from under you without warning. However, if you're using classes then the syntax for creating the class will be different.

    NOTE 2: I'm only about 90% sure that I got the correct order of creditor/debitor throughout my code. Test this code, and if it turns out that I swapped them, then swap the appropriate parts (like the let c, d = ... line) of my code.

    I hope this step-by-step construction of the solution helps you better understand what the code is doing, and how to do things in proper functional style.