Search code examples
algorithmsuffix-tree

Suffix Tree and Longest Repeated Substring issue


When running the algorithm on the string 'AEKEAAEKEAAEKEA$' looking for the longest substring with at least 3 occurences all the nodes in the suffix tree have maximum 2 branches, how can that be?

The correct result should be the substring 'AEKEA'.

You can easily see the tree in the online suffix tree builder

I just followed the Wikipedi description:

"The problem of finding the longest substring with at least k occurrences can be found by first preprocessing the tree to count the number of leaf descendants for each internal node, and then finding the deepest node with at least k descendants"

What am I missing here?

Thank you.


Solution

  • I don't think that website is correct. When I run 'AEKEAAEKEAAEKEA' through my suffix tree, I get the following tree.

    └── (0)
        ├── (27) $
        ├── (6) A
        │   ├── (26) $
        │   ├── (16) AEKEA
        │   │   ├── (17) $
        │   │   └── (7) AEKEA$
        │   └── (18) EKEA
        │       ├── (19) $
        │       └── (8) AEKEA
        │           ├── (9) $
        │           └── (1) AEKEA$
        ├── (4) E
        │   ├── (24) A
        │   │   ├── (25) $
        │   │   └── (14) AEKEA
        │   │       ├── (15) $
        │   │       └── (5) AEKEA$
        │   └── (20) KEA
        │       ├── (21) $
        │       └── (10) AEKEA
        │           ├── (11) $
        │           └── (2) AEKEA$
        └── (22) KEA
            ├── (23) $
            └── (12) AEKEA
                ├── (13) $
                └── (3) AEKEA$
    

    As you can see from this branch, you've found your longest substring with 3 occurences.

    └── (0)
        ├── (27) $
        ├── (6) A
        │   ├── (26) $
        │   ├── (16) AEKEA
        │   │   ├── (17) $
        │   │   └── (7) AEKEA$
        │   └── (18) EKEA
        │       ├── (19) $
        │       └── (8) AEKEA
        │           ├── (9) $
        │           └── (1) AEKEA$