I am working on a symbolic algebra system. I'm currently looking at how to carry out polynomial addition/multiplication from a binary parse tree. I will later consider this to be a general ring.
Parsing is not relevant here -- this intended to be the output of parsing. The parse tree that is formed. If something could be improved here, I'm certainly happy to learn about that too.
I 'feel' that this tree structure could be folded/crushed, yet I'm not too clear on how to go about it. I believe I can hack something together, but as I'm still learning Haskell, I wanted to learn what the more advanced users would do.
Below is relevant code background.
My two relevant data declarations are:
data Op = Add | Mul
data Tree a = Leaf a | Node Op [Tree a] [Tree a]
Here's one of my test examples:
-- 3*(x+2)*(x+(5*4))
test = Node Mul [
Node Mul
[Leaf "3"]
[Node Add
[Leaf "x"]
[Leaf "2"]
]
]
[Node Add
[Leaf "x"]
[Node Mul
[Leaf "5"]
[Leaf "4"]
]
]
This is a typical recursive tree type, with a node holding an operation as well as lists of left and right trees. Computation proceeds by the below operations. Note: they are all string operations for now. I need them to be as general as possible (but will later add further structure, like multiplicative commutativity).
prod :: [Tree [a]] -> [Tree [a]] -> [Tree [a]]
prod ls rs = [Leaf (l ++ r) | (Leaf l) <- ls, (Leaf r) <- rs]
add :: [Tree [a]] -> [Tree [a]] -> [Tree [a]]
add l r = l ++ r
opOnLeaf :: Op -> [Tree [a]] -> [Tree [a]] -> [Tree [a]]
opOnLeaf op l r
| op == Add = add l r
| op == Mul = prod l r
operate :: Tree [a] -> [Tree [a]]
operate (Node op [Node op2 l2 r2] [Node op3 l3 r3]) = operate (Node op (operate (Node op2 l2 r2)) (operate (Node op3 l3 r3)))
operate (Node op [Node op2 l2 r2] r) = operate (Node op (operate (Node op2 l2 r2)) r)
operate (Node op l [Node op2 l2 r2]) = operate (Node op l (operate (Node op2 l2 r2)))
operate (Node op l r) = opOnLeaf op l r
I think my variable names in operate
are hideous, but I wanted to ensure it functioned first. Making it all hideous makes it stands out more too. It's practically yelling at me to find a better way.
Now, I'd like to use fold or foldMap, but I run into the issue that I have distinct monoids at each operation (namely (*)
and (+)
. And that's just the beginning -- I will be adding several other operations. So I'd need something like a 'variable monoid', or perhaps some other map whose operation was to first acquire the monoid. I'm not sure.
The query: given this binary tree, and multiple operations at each node, how would one proceed to fold up the structure? Is there a better way to write this as an instance of Foldable, or some other structure? I also aim to add several other binary operations. Ideally, I'd like a structure that'd be indefinitely extensible. I assume once it's solved for 2, we could solve it effectively for n.
"Fold" has two related but distinct meanings in common parlance.
Fold
as in Foldable
. Viewing the datatype as a container, Foldable
gives you access to the container's elements while throwing away any information about the shape of the container itself. (This is also the sense in which lens
's Fold
uses the term.) Not every datatype is Foldable
— only those which can be viewed as a container.The two meanings of "fold" happen to coincide when the datatype you're talking about is []
(which partly explains the confusion over the two meanings), but in general they don't. Your Tree
happens to be a suitable candidate for either type of fold, and from your question I can't quite tell which one you're after, so I'll show you how to do both.
The easiest way to write an instance of Foldable
is to turn on DeriveFoldable
.
{-# LANGUAGE DeriveFoldable #-}
data Tree a = Leaf a | Node Op [Tree a] [Tree a]
deriving (Foldable)
But for the sake of understanding I'll write out the instance here:
instance Foldable Tree where
-- foldr :: (a -> b -> b) -> b -> Tree a -> b
foldr f z (Leaf x) = f x z
foldr f z (Node _ ls rs) = foldr foldTree z (ls ++ rs)
where foldTree t z' = foldr f z' t
The idea is to traverse the tree to the bottom, applying f
to the values inside each Leaf
. Couple of things to note about this code:
foldr
is an overloaded function (it's a method of Foldable
), and in the second clause I'm using it at two different types. The foldr
in foldTree
is a recursive call to Tree
's definition of foldr :: (a -> b -> b) -> b -> Tree a -> b
. The foldr
in the line above it is a call to the usual list foldr :: (a -> b -> b) -> b -> [a] -> b
.foldr
discards information! In the Node
case, you can see from the _
that the Op
is dropped on the floor. We also smash ls
and rs
together into a single list, forgetting that they used to be in two lists.So Foldable
is all about finding the items inside a container, while discarding any information about the container itself. To put it bluntly, Foldable
is "toList
-as-a-class".
Foldable
is a great fit for container types and other abstract data types, where you want to expose a collection-like interface while hiding the internal representation of the data structure. In your case, you mentioned that you wanted to apply an operation +
or *
depending on the Op
inside the tree, but the Op
is discarded by Foldable
. So it seems like Foldable
is not the right tool for what you're trying to do.
If you want to reduce your datatype to a summary value without throwing away any information about the structure, you need a catamorphism.
To derive the signature for a catamorphism, follow this recipe.
cata
takes a value of your datatype and returns a summary value b
.cata
. Each of those arguments will be a function which returns a b
.b
.Let's follow these steps for your Tree
. First of all, cataTree
takes a Tree
and returns a summary value b
.
cataTree :: ??? -> Tree a -> b
Looking at Tree
's definition, we see two constructors. So cataTree
will have two function arguments, one for Leaf
and one for Node
.
cataTree :: ({- Leaf -} ??? -> b) -> ({- Node -} ??? -> b) -> Tree a -> b
Looking at the Leaf
constructor, we see one field. So the Leaf
function has a single argument.
cataTree :: (a -> b) -> ({- Node -} ??? -> b) -> Tree a -> b
Now let's look at Node
. Node
has three arguments, but two of them are lists of Tree
s. We want to replace each of those Tree
s with its corresponding summary value. So the final signature for cataTree
is
cataTree :: (a -> b) -> (Op -> [b] -> [b] -> b) -> Tree a -> b
Implementing cataTree
is a matter of following the types.
cataTree leaf _ (Leaf x) = leaf x
cataTree leaf node (Node op ls rs) =
node op (fmap (cataTree leaf node) ls) (fmap (cataTree leaf node) rs)