Search code examples
algorithmcomputer-sciencegreedy

Algorithm to "transfer water from a set of bottles to another one" (metaphorically speaking)


Ok, I have a problem. I have a set "A" of bottles of various sizes, all full of water. Then I have another set "B" of bottles of various sizes, all empty.

I want to transfer the water from A to B, knowing that the total capacity of each set is the same. (i.e.: Set A contains the same amount of water as set B).

This is of course trivial in itself, just take the first bottle in B, pour it in the first in A until this is full. Then if the bottle from B has still water in it, go on with the second bottle in A, etc.

However, I want to minimize the total number of pours (the action of pouring from a bottle into another, each action counts 1, independently from how much water it involves)

I'd like to find a greedy algorithm to do this, or if not possible at least an efficient one. However, efficiency is secondary to correctness of the algorithm (I don't want a suboptimal solution).

Of course this problem is just a metaphor for a real problem in a computer program to manage personal expenses.


Solution

  • Bad news: this problem is NP-hard by a reduction from subset sum. Given numbers x1, …, xn, S, the object of subset sum is to determine whether or not some subset of the xis sum to S. We make A-bottles with capacities x1, …, xn and B-bottles with capacities S and (x1 + … + xn - S) and determine whether n pours are sufficient.

    Good news: any greedy strategy (i.e., choose any nonempty A, choose any unfilled B, pour until we have to stop) is a 2-approximation (i.e., uses at most twice as many pours as optimal). The optimal solution uses at least max(|A|, |B|) pours, and greedy uses at most |A| + |B|, since every time greedy does a pour, either an A is drained or a B is filled and does not need to be poured out of or into again.

    There might be an approximation scheme (a (1 + ε)-approximation for any ε > 0). I think now it's more likely that there's an inapproximability result – the usual tricks for obtaining approximation schemes don't seem to apply here.


    Here are some ideas that might lead to a practical exact algorithm.

    Given a solution, draw a bipartite graph with left vertices A and right vertices B and an (undirected) edge from a to b if and only if a is poured into b. If the solution is optimal, I claim that there are no cycles – otherwise we could eliminate the smallest pour in the cycle and replace the lost volume going around the cycle. For example, if I have pours

    a1 -> b1: 1
    a1 -> b2: 2
    a2 -> b1: 3
    a2 -> b3: 4
    a3 -> b2: 5
    a3 -> b3: 6
    

    then I can eliminate by a1 -> b1 pour like so:

    a2 -> b1: 4 (+1)
    a2 -> b3: 3 (-1)
    a3 -> b3: 7 (+1)
    a3 -> b2: 4 (-1)
    a1 -> b2: 3 (+1)
    

    Now, since the graph has no cycle, we can count the number of edges (pours) as |A| + |B| - #(connected components). The only variable here is the number of connected components, which we want to maximize.

    I claim that the greedy algorithm forms graphs that have no cycle. If we knew what the connected components of an optimal solution were, we could use a greedy algorithm on each one and get an optimal solution.

    One way to tackle this subproblem would be to use dynamic programming to enumerate all subset pairs X of A and Y of B such that sum(X) == sum(Y) and then feed these into an exact cover algorithm. Both steps are of course exponential, but they might work well on real data.