What are the maximum and minimum number of edges in a suffix tree? I know the maximum is 2m-1, but I don't understand why that is so.
First, about the maximum number of edges:
It's quite easy to understand if you think of the edges as coming in two flavours: Edges that lead to a leaf node, and edges that lead to an inner node. In the following, I'll assume that the string the suffix tree is constructed for is N
characters in length.
About the edges leading to leaves. There must be exactly one leaf for every suffix, and every leaf has only one inbound edge (and no outbound edges). So there must be N
edges leading to leaves.
About the edges leading to inner nodes. Like with leaf nodes, inner nodes, too, have only one inbound edge each. Therefore, to determine how many edges leading to inner nodes there can possibly be, it suffices to determine how many inner nodes there can be. So, what is the maximum possible number of inner nodes?
For this it is important to see that inner nodes are only inserted into the suffix tree at branching points, i.e. the number of outbound edges of an inner node is always at least 2 (if there was only one outbound edge, the inner node wouldn't have been constructed in the first place). But every outbound edge must eventually lead to a leaf node (possibly after going through further inner nodes). In other words, every inner node inserted into the tree increases the total number of leaf nodes by at least 1. Furthermore, even a tree with no inner nodes whatsoever must have at least one edge going out of the root node (unless the tree is empty). Hence, in a non-empty tree, the total number of leaves L
must be
L >= I + 1
where I
is the number of inner nodes. Conversely, the number of inner nodes is
I <= L - 1 = N - 1
Returning to the original question, the number of edges leading to inner nodes is, as we said, the same as the number of inner nodes, hence it is also bound by N - 1
.
We conclude that the total number of edges is the number of edges leading to leaves (N
) plus the number of edges leading to inner nodes (<=N-1
), hence its maximum is bound by
N + (N-1) = 2N - 1
quod erat demonstrandum.
Regarding the minimum number of edges: This follows the same principle, i.e. we calculate the number of edges leading to leaves and the number of edges leading to internal nodes, and then add them together.
The number of nodes leading to leaves is always N
, i.e. N
is the maximum as well as the minimum possible.
The number of nodes leading to internal nodes, however, may be zero. E.g. when the input string has no repeated elements, such as abcdef
, there are exactly N
edges, each from the root straight to a leaf. No branching points, no internal nodes. Hence the minimum number of edges leading to internal nodes is 0.
In conclusion, the minimum number of edges is N + 0 = N
.