Search code examples
treegraph-theorymathematical-optimizationcombinatorics

Calculate an upper bound on a tree containing sum nodes, choice nodes, and requiring consistent choices


I have a graph containing sum nodes, choice nodes and point nodes. I'd like to maximize the number of points at the root of the graph.

  • The value you get for a point node is the number of points on that node.
  • The value of a sum node is the sum of the values of its children.
  • The value for a choice node is the value of one of its children, but you have to make consistent choices for nodes with the same number across different subtrees.

Here's an example (graphviz):

A tree containing sum nodes and choice nodes

There are four possible choices here, which result in the following number of points:

  • 1=x, 2=a → 3 (1 + 2)
  • 1=x, 2=e → 2 (1 + 1)
  • 1=y, 2=a → 3 (1 + 1 + 1)
  • 1=y, 2=e → 5 (1 + 3 + 1)

So the maximum value of this tree is 5, achieved by choosing 1=y and 2=e. Notably this is less than what you'd get if you took the maximum value at each choice node. That would yield 6 points (3 + 1 + 2) but would require inconsistent choices (1=y/2=e on the left subtree, 1=x/2=a on the right subtree).

This is a small example, but in practice there are 10-20 choices to be made and 5-10 options for each choice: too many to enumerate exhaustively.

My questions are:

  • How can I I efficiently find the set of choices that maximizes the points at the root of the tree?
  • If it's not possible to find the exact maximum, what are some options for finding an upper bound? Taking the maximum value at each choice node does yield an upper bound, but this is typically much too loose.
  • Does this seem related to any other data structures and algorithms problem?

Here's a gist with an example tree and Python code to load it. Some stats on the example:

  • 411,315 nodes, depth of 19
  • 52 options across 9 choices, 6,075,000 possible combinations
  • 8,346 - Upper bound from taking the max at each node
  • 663 - true maximum (choices=[5, 0, 4, 5, 1, 2, 3, 1, 1], found by exhaustive search)

This tree is still relatively small, but it gives a flavor of what these look like.

What have I tried so far? Many things, but a few highlights:

  • Make incremental choices You can pick a single choice (say #1) and get an upper bound for the tree assuming you take each possible option (1=x, 1=y). These bounds will be lower than what you get if you make no choices. If this bound is low enough, great. Otherwise, for each of those options, pick another choice (say #2) and repeat until you get all the bounds low enough. This incrementally synchronizes the choices and doesn't use much extra memory. I have code to do that (with an implicit tree) here.
  • Pivot choices up the tree The bounds you get from from taking the max at every node are imprecise because choices appear in different orders in different subtrees. So, to improve the bound, you can "pivot" a particular choice all the way up the tree. This increases the size of the tree but reduces the bound. I have code to do that (with a different tree representation) here. If you repeatedly lift different choices, your bound keeps going down and you eventually get the true maximum. The cost is that you use a lot of space and compute.
  • Choice Mask You can compute and maintain a mask of which choices appear below each node. This makes it possible to avoid unnecessary subtree traversals in both of the previous two strategies.
  • De-dupe Nodes The main reason the "pivot" operation expands the tree is because it has to duplicate subtrees when you pivot through a choice or sum node. So by memoizing results for subtrees (or, equivalently, thinking of the tree as a DAG), you can eliminate duplicate computation.

So I've tried a lot! Between these I'm able to get a 1,000+x speedup over exhaustive search, depending on the tree. But I still have the sneaking sense that there's some better way to think about this (dynamic programming?), or a known algorithm that this maps onto.


Solution

  • The problem is NP-hard. Suppose you have a 3-CNF formula with $m$ clauses on variables $x_1,\dots,x_n$. We'll build a tree whose root can achieve $m$ points iff the formula is satisfiable, otherwise its root can only achieve $\le m-1$ points. Given any 3-CNF clause, it is easy to build a subtree of depth 3 (with 4 leaves) whose value will be 1 if the clause is satisfied and will be 0 if the clause is not satisfied. (Each choice node is numbered from 1 to $n$; a choice node with number $i$ corresponds to variable $x_i$.). Then the root node is a sum node with $m$ children; each child of the root is a subtree corresponding to a clause of the formula.

    So you should not expect any efficient algorithm that is always correct and scales well to large problem instances.

    In practice you could try to solve this with an integer linear programming (ILP) solver. For each separate choice node, add a zero-or-one integer variable. Also add a variable for each node of the tree expressing the value on that node. Then you can write constraints that express how the value on each node relates to the values on its children, and use the ILP solver to maximize the value on the root node. Whether this will work well enough will depend on your problem instance; the only way to know is to try it and see.