Search code examples
lisplanguage-designrecursive-datastructurescircular-list

Detecting shared structure in tree made of cons cells


I'm writing a programming language (details irrelevant) which uses Lisp-like cons cells to store its data (this makes implementing the garbage collector easy). I'll spare you the details of everything else, but I'm trying to write a function that will allow code written in this programming language to detect when an object is self-referential (such as Lisp #1=(#1#)), or if two objects have shared structure (such as (#1=(123) #1#)

This is the best (recursive) function I have...

bool points_to(obj* from, obj* to) {
    if (from == NULL) return false;
    if (from == to || points_to(from->car, to)) return true;
    return points_to(from->cdr, to);
}

As you can see, if I have a circular list #1=(1 2 3 . #1#) and call points_to(x, x) it will easily tell me that the list is circular. However, if I try another structure that contains a self-referential object, but the head passed in isn't self-referential (such as (1 2 . #1=(3 . #1)), the above function will either blow the call stack or hang forever (depending on whether the last call is tail-call optimized by the compiler) when it tries to search the self-referential part for a cons cell that the self-referential section doesn't point to.

Obviously there is an answer to this question (there are Lisp/Scheme implementations that can detect this when printing an object and print out the object with the #N=/#N# reference notation or something functionally identical), but I cannot find anything about what this algorithm actually is.

If it helps, I'm writing this in C++, not Lisp, so if needed I can sweep the entire image (every object that was ever allocated) and/or use the garbage collector's recursive marking functionality.


Solution

  • Here is how I have done this. There may be much better ways. You need to walk the object twice.

    You have a hash table: walk over the object and for each interesting thing either enter it in the table with a value of 1 or increment its value if it's already there. If the thing is in the table already do not walk into it, just bump the count (actually you never need to bump it above 2). 'Interesting' is a bit complicated: for CL conses are interesting, interned symbols aren't, uninterned symbols are, and so on. It is safe (but annoying for people reading the output) to think things are interesting when they are not, but not safe to think they are not when they are).

    Once you have done this walk the object again to print it, using the table you built above and a counter, initially 1.

    • If you get to an object whose value is > 1, set its value in the table to the negative of the counter and increment the counter. This is the first time you've seen this object when printing so use the previous counter value for #1=.
    • If you get to an object whose entry in the table is negative, then this is another occurrence of something we've seen before so use the negative of the value in the table for #1#.

    You will then spend a really long time getting the thing to be able to print consing dots in the right places.