Linking in tree structures

Upon working with long strings now, I came across a rather big problem in creating suffix trees in Haskell.

Some constructing algorithms (as this version of Ukkonen's algorithm) require establishing links between nodes. These links "point" on a node in the tree. In imperative languages, such as Java, C#, etc. this is no problem because of reference types.

Are there ways of emulating this behaviour in Haskell? Or is there a completely different alternative?

Solution

You can use a value that isn't determined until the result of a computation in the construction of data in the computation by tying a recursive knot.

The following computation builds a list of values that each hold the total number of items in the list even though the total is computed by the same function that's building the list. The let binding in zipCount passes one of the results of zipWithAndCount as the first argument to zipWithAndCount.

zipCount :: [a] -> [(a, Int)]
zipCount xs = 
    let (count, zipped) = zipWithAndCount count xs
    in zipped

zipWithAndCount :: Num n => b -> [a] -> (n, [(a, b)])
zipWithAndCount y [] = (0, [])
zipWithAndCount y (x:xs) =
    let (count', zipped') = zipWithAndCount y xs
    in (count' + 1, (x, y):zipped')

Running this example makes a list where each item holds the count of the total items in the list

> zipCount ['a'..'e']
[('a',5),('b',5),('c',5),('d',5),('e',5)]

This idea can be applied to Ukkonen's algorithm by passing in the #s that aren't known until the entire result is known.

The general idea of recursively passing a result into a function is called a least fixed point, and is implemented in Data.Function by

fix :: (a -> a) -> a
fix f = let x = f x in x

We can write zipCount in points-free style in terms of zipWithAndCount and fix.

import Data.Function

zipCount :: [a] -> [(a, Int)]
zipCount = snd . fix . (. fst) . flip zipWithAndCount