I'm working on a php script which has to find the longest repeated substring. I found this Suffix-Tree thing. I'm trying to implement Ukkonnen's algorithm, but I can't get when and how to extend the tree.
It's okay if i have new charachter which is not in the tree yet I have to create a new node and egde from the root for it. But how should I know if I have to split up an edge?
I found a C++ implementation of it (link) and I tried to translate it to php but I think I've got a typeo in it because it gives an almost good result, the problem is I can't fix it until I do not understand it fully...
I read a dozen of descriptions of Suffix-Trees but some of them do not go too deep in it, others give me headache after the second sentece.
Here is the code what I have now: Suffix-tree.php (Sorry but this editor couldn't take it) I used this site to check the result.
So any advise would be appreciated...
EDIT: I rewrote it from the JavaScript stuff found on the mentioned site. Here is the link to the source: Suffix-Tree v0.1
A good explanation is given by Matt Mahoney, a data compression expert. But I didn't understand the implementation too, it's quite difficult. FYI I've managed to run a suffix-tree php extension. You can find my code at sourceforge if it helps. I would love to see your final code though!