Search code examples
stringalgorithmsuffix-arraystring-algorithm

Which character to append to string in suffix array?


I was solving

https://www.spoj.com/problems/BEADS/

above question at SPOJ. I have stated the relevant information below:

Problem Statement: The description of the necklace is a string A = a1a2 ... am specifying sizes of the particular beads, where the last character am is considered to precede character a1 in circular fashion. The disjoint point i is said to be worse than the disjoint point j if and only if the string aiai+1 ... ana1 ... ai-1 is lexicografically smaller than the string ajaj+1 ... ana1 ... aj-1. String a1a2 ... an is lexicografically smaller than the string b1b2 ... bn if and only if there exists an integer i, i <= n, so that aj=bj, for each j, 1 <= j < i and ai < bi.

Output: For each test case, print exactly one line containing only one integer -- number of the bead which is the first at the worst possible disjoining, i.e. such i, that the string A[i] is lexicographically smallest among all the n possible disjoinings of a necklace. If there are more than one solution, print the one with the lowest i.

Now the solution is using SUFFIX ARRAY. Input string s, and concat with itself, s'=s+s ,since I have to sort cyclic suffixes of array. Then create a suffix array on s', and output the smallest index that points to a suffix of original s, i.e., index < len(s).

But there is a problem I face. I was appending '$' character to get SA, but I was getting wrong answer. After looking online, I found 1 solution that had appended '}' to string. I found that ascii('$') < ascii('a') < ascii('z') < ascii('}')

But i don't understand how this will make a difference, why this is accepted answer and haven;t found a case where this will make a difference. The solution (AC) can be found here:

Link to Code

    #include <bits/stdc++.h>
using namespace std;

string s;int n;
bool cmp_init(int a, int b)
{
    return s[a]<s[b] || (s[a]==s[b] && a<b);
}

int jmp;
vector<int> pos;
bool cmp(int a, int b)
{
    return pos[a]<pos[b] || (pos[a]==pos[b] && pos[(a+jmp)%n]<pos[(b+jmp)%n]);
}

int main() {
    int tc;cin>>tc;
    while(tc--){
    cin>>s;
    int m=s.size();
    s=s+s+"{";
    n=s.size();
    
    vector<int> SA(n,0);
    for(int i=0;i<n;i++)SA[i]=i;
    sort(SA.begin(), SA.end(), cmp_init);
    pos.assign(n,0);
    
    for(int i=1 , c=0;i<n;i++)pos[SA[i]]=(s[SA[i]]==s[SA[i-1]])?c:++c;
        
    for(jmp=1;jmp<=n;jmp*=2)
    {
        
        sort(SA.begin(), SA.end(), cmp);
        
        vector<int> tmp(n,0);
    
        for(int i=1 , c=0;i<n;i++)tmp[SA[i]]=(pos[SA[i]]==pos[SA[i-1]] && pos[(SA[i]+jmp)%n]==pos[(SA[i-1]+jmp)%n])?c:++c;

        for(int i=0;i<n;i++)pos[i]=tmp[i];

    }
    
        for(int i=0;i<n;i++)if(SA[i]<m){cout<<SA[i]+1<<"\n";break;}
    }
    
}


PS.: I have found that SA construction code is correct, only problem is with last character appending. Normally we append '$' in SA construction.


Solution

  • The difference is in the last condition:

    If there are more than one solution, print the one with the lowest i.

    Consider input "abab".

    The correct answer is 0, which you get when you append '}', because "abababab}" is less than all of its suffixes.

    If you append '$', you get the wrong answer, because "ab$" < "abab$" < "ababab$" < "abababab$".