Search code examples
pythonrecursionfunctional-programmingimmutabilitycomputer-algebra-systems

Recursively operating on a tree structure: How do I get the state of the "entire" tree?


First, context:

As a side project, I'm building a computer algebra system in Python that yields the steps it takes to solve an equation.

So far, I've been able to parse algebraic expressions and equations into an expression tree. It's structured something like this (not the actual code—may not be running):

# Other operators and math functions are based off this.
# Numbers and symbols also have their own classes with 'parent' attributes.
class Operator(object):
    def __init__(self, *args):
        self.children = args            
        for child in self.children:
            child.parent = self

# the parser does something like this:
expr = Add(1, Mult(3, 4), 5)

On top of this, I have a series of functions that operate recursively to simplify expressions. They're not purely functional, but I'm trying to avoid relying on mutability for operations, instead returning a modified copy of the node I'm working with. Each function looks something like this:

def simplify(node):
    for index, child in enumerate(node.children):
        if isinstance(child, Operator):
            node.children[index] = simplify(node)
        else:
            # perform some operations to simplify numbers and symbols
            pass

    return node

The challenge comes in the "step by step" part. I'd like for my "simplification" functions to all be nested generators that "yield" the steps it takes to solve something. So basically, every time each function performs an operation, I'd like to be able to do something like this: yield (deepcopy(node), expression, "Combined like terms.") so that whatever is relying on this library can output something like:

5x + 3*4x + 3
5x + 12x + 3                 Simplified product 3*4x into 12x
17x + 3                      Combined like terms 5x + 12x = 17x

However, each function only has knowledge about the node it's operating on, but has no idea what the overall expression looks like.

So this is my question: What would be the best way of maintaining the "state" of the entire expression tree so that each "step" has knowledge of the entire expression?

Here are the solutions I've come up with:

  • Do every operation in place and either use a global variable or an instance variable in a class to store a pointer to the equation. I don't like this because unit testing is tougher, since now I have to set up the class first. You also lose other advantages of a more functional approach.
  • Pass through the root of the expression to every function. However, this either means I have to repeat every operation to also update the expression or that I have to rely on mutability.
  • Have the top level function 'reconstruct' the expression tree based on each step I yield. For example, if I yield 5x + 4x = 9x, have the top level function find the (5x + 4x) node and replace it with '9x'. This seems like the best solution, but how best to 'reconstruct' each step?

Two final, related questions: Does any of this make sense? I have a lot of caffeine in my system right now and have no idea if I'm being clear.

Am I worrying too much about mutability? Is this a case of premature optimization?


Solution

  • You might be asking about tree zippers. Check: Functional Pearl: Weaving a Web and see if it applies to what you want. From reading your question, I think you're asking to do recursion on a tree structure, but be able to navigate back to the top as necessary. Zippers act as a "breadcrumb" to let you get back to the ancestors of the tree.

    I have an implementation of one in JavaScript.