Given immutability of data in Rascal, what is the preferred method for these, if later operations depend on the results of earlier ones?
For example, assigning annotation values to every node in a tree, where values of higher nodes depend on values of lower ones. If you write a single visit statement with multiple cases, then insertion statements at lower levels don't change the tree, so higher level operations may have nothing on which to operate. On the other hand, surrounding each case statement with a visit statement -- and rebinding your tree variable to the new tree after every visit -- is cumbersome and, worse, seems to make the results depend on the order of the statements.
Subtle question :-) The visit
statement has a subtle semantics especially if you look at it from an OO perspective. While it is actively traversing a value it is also rebuilding a new one. Depending on the strategy (order) of the traversal sequential matches in its case statement "see" different things.
It is sometimes insightful to imagine visit as a code generator which generates a set of mutually recursive functions which take the visited part as argument and return a new value on return. The body of the visit (the cases) turn into a switch statement which is the core of these generated functions. Then, depending on the traversal order, the recursive steps are either positioned before (bottom-up
) or after this switch statement (top-down
):
// bottom-up visit
x f(value x) {
newChildren = map(f, children of x);
x = newX(x, newChildren);
switch (x) {
case y : return whatever(y);
}
}
Hence the code in the switch case for bottom-up
visits see the trees produced by recursive calls (although no tree has actually been updated).
Here's an example:
rascal>data T = tee(T, T) | i(int i);
data T = tee(T, T) | i(int i);
ok
rascal>t = tee(tee(i(1),i(2)),tee(i(3),i(4)));
t = tee(tee(i(1),i(2)),tee(i(3),i(4)));
T: tee(
tee(
i(1),
i(2)),
tee(
i(3),
i(4)))
rascal>visit(t) {
>>>>>>> case i(x) => i(x+1)
>>>>>>> case tee(i(x),y) => tee(i(x+1),y)
>>>>>>>}
T: tee(
tee(
i(3), // here you see how 1 turned into 3 by incrementing twice
i(3)), // increment happened once here
tee(
i(5), // increment happened twice here too
i(5))) // increment happened once here
You can see that some nodes had an increment twice because the second case matched
one a tee
after it's child i
node was already visited to return a different i
node which is already incremented.
Experimenting with the other strategies will give you other results, see http://tutor.rascal-mpl.org/Rascal/Rascal.html#/Rascal/Expressions/Visit/Visit.html . Note that the variables in the scope of the visit statement are shared by all visits on all levels of recursion which gives power to emulate zipper-like behavior (you can always store the previously visited node in a temporary).
As an aside, the language design is trying to avoid the need for more involved "functional programming design patterns" such as zippers because they complicate the type system and the way people interact with it. To make these things work type correctly in a visit which does a recursion over a heterogeneous data-type you need a PhD in polymorphism to understand when it is type correct. Secretly, the visit statement simulates a set of built-in type-safe rank-2 higher-order polymorphic functions, but that's all under the hood.