Search code examples
c++dynamic-programmingtraveling-salesman

How to output the lexicographically smallest one shortest superstring?


The problem is : Given n strings si and find out the shortest string S so that every si is a substring in S. And output should be the lexicographically smallest one when there are multiple possible answers.

For example: the testcase:{qgw, qsopd, qwrw, wckumn} The answer should be "qgwckumnqsopdqwrw", but my code output "qgwqsopdqwrwckumn". I think the problem is in solve function that it didn't consider all the possible prev string. Please help me find out where I should modify the code, thank you so much!

Below is my code:

class Solution {
    vector<vector<string>> req;
    string merge(string &a, string &b) 
    {
        int n=a.size(), m=b.size(), len=1, idx=0;
        while(len<=min(n, m))
        {   if(a!=""&&b!=""){
                if(a!=b&&b.find(a)!= string::npos)
                {   a="";
                }
                else if(a.substr(n-len)==b.substr(0, len))
                {
                    idx=len;
                }
            }

            len++;
        }
        
        string res=b.substr(idx); // idx is distance to non-overlap and res is the non-overlap str
        return res;
    }

    string solve(vector<string> &words, int prev, int mask, int n, vector<vector<string>> &dp)
    //prev:parent word, mask:track which word has been selected 
    {   
        if(dp[prev][mask]!="") return dp[prev][mask];// check if the dp is empty
        string res="";
        int minLen=INT_MAX;
        for(int i=0; i<n; i++)
        {
            if(mask&(1<<i)) continue; //check if the i th word has been used
            string temp=req[prev][i]+solve(words, i, mask|(1<<i), n, dp);
            int temp_size=temp.size();
            if(temp_size<minLen) 
            {
                minLen=temp.size();
                res=temp;
            }
        }

        return dp[prev][mask]=res;
    }

public:
    string shortestSuperstring(vector<string>& words)
    {
        int n=words.size();
        sort(words.begin(), words.end());

        req.resize(n, vector<string> (n, ""));
        vector<vector<string>> dp(n, vector<string> ((1<<(n+1)), ""));
        for(int i=0; i<n; i++)
        {
            for(int j=0; j<n; j++)
            {
                if(i==j) continue;
                req[i][j]=merge(words[i], words[j]);
            }
        }

        string ans="";
        int minLen=INT_MAX;
        int mask=0;
        for(int i=0; i<n; i++)
        {
            string temp=words[i]+solve(words, i, mask|(1<<i), n, dp);
            cout<<"temp: "<<temp<<"\n";
            int temp_size=temp.size();
            if(temp_size<minLen) 
            {
                minLen=temp.size();
                ans=temp;
            }

        }
        return ans;
    }
};

Solution

  • The main issue in your existing code is that it does not account for the lexicographically smallest requirement when there are multiple possibilities. It only considers the string length.

    Some modifications you can make to your code to solve this problem:

    1. Change your merge function to return a pair of integers: the length of the overlapping part and the index of the second string. This will allow you to keep track of the lexicographical order.
    2. In your solve function, when comparing temp_size with minLen, also consider the lexicographical order when the lengths are equal. You can achieve this by adding a secondary condition to your if statement like so:
    if(temp_size < minLen || (temp_size == minLen && temp < res)) 
    {
        minLen = temp_size;
        res = temp;
    }
    
    1. In your shortestSuperstring function, similarly add a secondary condition to consider the lexicographical order:
    if(temp_size < minLen || (temp_size == minLen && temp < ans)) 
    {
        minLen = temp_size;
        ans = temp;
    }