I have read that :
Searching for a substring, pat[1..m], in txt[1..n], can be solved in O(m) time (after the suffix tree for txt has been built in O(n) time).
but at each point, we will have to choose which branch to take, so like in n-ary tree, at each node, we will have to compare with all max n pointers in that node to decide which branch to take. Will this not bring n factor in complexity of this algorithm, somehow in picture
Then how above it says that substring can be found in O(m)?
What am I missing here?
Then how above it says that substring can be found in O(m)?
By omission. You are correct that the runtime of searching in suffix trees is more complex than merely O(m).
However, it can indeed be sped up to O(m) if we trade off space requirements: we need to get the search at each node down to O(1) and we can do this by using an appropriate data structure (e.g. an array) which gives us the appropriate sub-tree for each letter in constant time.
For instance, assume that you’re using C++ for the implementation and your character (char
) can contain 256 different values. Then the implementation of a node could look as follows:
struct node {
char current_character;
node* children[256];
};
Now, current_character
refers to the character of the branch leading to the current node, and children
is an array of child nodes. During the search, assume that you are currently at node u
, and the next character in the input text is c
. Then you will choose the next node as follows:
node* next = u->children[c];
if (next == 0) {
// Child node does not exist => nothing found.
}
else {
u = next;
// Continue search with next …
}
Of course, this is only viable for very small alphabets (e.g. DNA for genomic sequences). In most common cases, the worst-case runtime of a suffix tree is indeed higher than O(m).