I need to generate all 3-dimensional arrays whose values sum up to 1.0, i.e., they are convex combinations.
Let's assume that each element can be one of [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
. Hence, combinations like [0.0,0.4,0.6]
or [0.2,0.6,0.2]
are correct, as they sum up to 1.0, but a combination like [1.0,0.4,0.2]
would be incorrect as it sums up to 1.6.
From this question and answer, I know how to generate all combinations of given arrays. Hence, I could do:
ratios = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
result = np.stack(np.meshgrid(ratios, ratios, ratios), -1).reshape(-1, 3)
and then I could simply filter only those that do sum up to 1.0:
result[np.isclose(np.sum(result, axis=1), 1.0)]
However, this quickly becomes computationally intensive for highly-dimensional scenarios that may easily have billions of combinations, but only a tiny portion (less than 1/1000) satisfies the convex condition.
There is also a similar problem of finding all possible combinations of numbers to reach a given sum. However, in that scenario, the dimension is not fixed, hence [1.0]
or [0.2,0.2,0.2,0.2,0.2]
would both be valid solutions.
Is there a more efficient way, which assumes a fixed sum and fixed dimension?
Another (almost academic) solution is to use branch&bound.
Branch&bound is a way to explore all possible combinations recursively, and keep the correct ones. But because it is done recursively, those possible combinations are explored in an implicit tree, giving the opportunity to discard whole subtrees without the need to explore them.
The ideal situation for branch&bound is when you can say "there are 2 kinds of solutions. Those, with xxx, and those without. And among those solutions, there are 2 kinds of solutions. Those with yyy, and those without. Etc.". Here "there are 2 kinds of solutions. Those with 0.0, and those without 0.0. And among the first, there are 2 kinds, those with another 0.0, and those with only one. Among the second, 2 kinds, those with 0.2, and those without. Etc."
In your case the code could be
import math
eps=1e-5
ratios = [0.0, 0.2, 0.4, 0.6, 0.8, 1.0]
def convbb(a, t=1.0, nc=3, p=[]):
if nc==0:
if t<eps:
return [p]
return []
if len(a)==0:
return []
sols=[]
if a[0]<=t+eps:
sols = convbb(a, t-a[0], nc-1, p+[a[0]])
return sols + convbb(a[1:], t, nc, p)
convbb(ratios[::-1])
#[[1.0, 0.0, 0.0], [0.8, 0.2, 0.0], [0.6, 0.4, 0.0], [0.6, 0.2, 0.2], [0.4, 0.4, 0.2]]
The advantage of this over itertools method is not obvious for your example. On my machine, it is twice slower.
It explores less combinations (itertools method explore all of them ; this one obviously never explore any solution starting with [1.0, 0.2]
, without even bothering to choose a 3rd value). But because it is pure python (plus recursion, which is comfortable, but not optimal ; plus my usage of list concatenation, that should be replaced by a static list ; plus other non-optimal coding), it is slower.
Its advantage shows for bigger number ratios. For example, for 150 values, this code is twice faster.
And, more importantly, it shows for bigger number of values in result. With 3, advantage of cutting exploration before the end is never obvious, since I need to have at least 2 values in a partial combination to see that there is no need to choose a 3rd. So, all I can cut, is the choice of a 3rd solution.
But if you use, for example, this to find combinations of 6 values, then, itertools method is explosively expansive. When this one remains manageable. On my computer, is is 5 seconds for this solution, vs 900 seconds for itertools one. For 7 values, this solutions is less than 10 seconds. Itertools one stand no chance to finish before the day (whatever timezone you live in :D)
But while I was typing that, you said in a comment that ratios were evenly spaced. That changes everything! Because if they are evenly spaced, from 0 to 1, then we know in advance all the working solutions. They are
[(ratios[i], ratios[j], ratios[-i-j-1]) for i in range(len(ratios)) for j in range(i, (len(ratios)-i+1)//2)]
Because in that case, your problem is equivalent as finding the sum of 3 integers in range [0, len(ratios)] whose total is len(ratios).
For more than nc=3 numbers, the same logic apply, with more recursion.
For an unknow nc, a recursive function can also be written, but far simpler than my previous one. It is just about finding integers a₀≤a₁≤...≤aₙ
such as a₀+a₁+...+aₙ=n.
def recursFind(N, nc=3, i=0, t=0, p=[]):
if nc==1:
# No need to explore, last value is N-t
if N-t>=i:
yield p+[N-t]
else:
pass # p+[N-t] is a solution, but it has already been given in another order
elif i*nc+t>N:
return # impossible to find nc values>=i (there are some <i. But that would yield to already given solutions)
else:
for j in range(i, N):
yield from recursFind(N, nc-1, j, t+j, p+[j])
[[ratios[i] for i in idx] for idx in recursFind(len(ratios)-1, nc)]
That's one possible implementation. It is really fast compared to others (again, because it is not the same problem, once you said ratios are evenly spaced). And it could be faster, because I was lazy (I could have could have computed upper bound instead of N
for line for j in range(i,N)
, like it did for case nc=3, removing the need to cut for test i*nc+t>N
). But it is fast enough to make my point.
If you have just a reasonable set of ratios, and searching for no more than 3 of them, then, itertools version is the way to go
If you have more ratios, and more importantly, may want to find combinations of more than 3 of thems, then branch&bound solution is the way to go.
Unless, as you implied in comment, ratios are evenly spaced from 0 to 1. In which case, it is a much simpler problem, and my last solution (or even better ones) are what you are looking for.