c++string algorithm time-complexity suffix-array

How to find ith substring of a string using suffix array and LCP array?

If we arrange all the distinct sub-strings of a string lexicographically and we need the i'th substring

1.) Is it possible to find it using it's suffix array and LCP array?

2.) If Yes, how do we do it ? can it be done in O(Nlog^N) while creating the suffix array using Manber & Myers which has a time complexity of O(Nlog^2N) or while creating it's LCP array using kasai's algorithm which has a time complexity of O(N)?

Solution

Yes it can be done using Suffix array and LCP array.

Assuming you know how to calculate Suffix array and LCP array.

Let p[] denote suffix array lcp[] denote LCP array.

create a array which store the number of distinct sub string till i'th rank suffix. This can be calculated using this formula. For more details see Here

Let cum[] denote cumulative array which can be calculated as follow:

cum[0] = n - p[0];
for i = 1 to n do:
    cum[i] = cum[i-1] + (n - p[i] - lcp[i])

Now for finding i'th sub string just find the lower bound of i in cumulative array cum[] that will give you rank of suffix from where your sub string should start and print all character till length of

i - cum[pos-1] + lcp[pos] // i lies between cum[pos-1] and cum[pos] so for finding 
                          // length of sub string starting from cum[pos-1] we should 
                          // subtract cum[pos-1] from i and add lcp[pos] as it is 
                          // common string between current rank suffix and 
                          // previous rank suffix.

where pos is value return by lower bound.

Whole above process can be summarized as follow:

string ithSubstring(int i){
    pos = lower_bound(cum , cum + n , i);
    return S.substr(arr[pos] , i - cum[pos-1] + lcp[pos]);// considering S as original character string 
}

For full implementation of Suffix array , LCP and above logic you can see Here