Search code examples
javaalgorithm

find number of subsequences which are greater than another string


Given two strings s of length m and another string t of length n, count how many subsequences of s are greater than t

A sequence p is called greater than another sequence q if it satisfies below cases:

a) p[i] > q[i] at the first position where p and q differ, or
b) |p| > |q| and q is a prefix of p (where |p| denotes the length of password p).

Example:

s ="bab" t ="ab"

Result = 5

Explanation:

Valid subsequences of s which are greater than t are
"b"
"ba"
"bb"
"bab"
"b"

constraints: length of s 1 to 10^5 length of t 1 to 100

The length of t can be more than the length of s also with valid combinations.

I solved it using a recursive approach but it is taking O(2^n * n) time complexity.

public class Main {
    private static final int MOD = 1_000_000_007;

    private static void subsequence(String s, int index, String current, List<String> subsequences) {
        if (index == s.length()) {
            if (!current.isEmpty()) {
                subsequences.add(current);
            }
            return;
        }
        subsequence(s, index + 1, current, subsequences);
        subsequence(s, index + 1, current + s.charAt(index), subsequences);
    }

    private static boolean isGreater(String s1, String t) {
        int len1 = s1.length();
        int len2 = t.length();
        for (int i = 0; i < Math.min(len1, len2); i++) {
            if (s1.charAt(i) > t.charAt(i)) {
                return true;
            } else if (s1.charAt(i) < t.charAt(i)) {
                return false;
            }
        }
        return len1 > len2;
    }

    public static int solve(String s, String t) {
        List<String> subsequences = new ArrayList<>();
        subsequence(s, 0, "", subsequences);

        int count = 0;
        for (String e : subsequences) {
            if (isGreater(e, t)) {
                count = (count + 1) % MOD;
            }
        }

        return count;
    }

    public static void main(String[] args) {
        System.out.println(solve("aba", "ab")); // Expected: 3
        System.out.println(solve("bab", "ab")); // Expected: 5
        System.out.println(solve("wrrmkhds", "bebbjvcgzlwtbvasphvm")); // Expected: 255
        System.out.println(solve("o", "h"));   // Expected: 1
    }
}

How can this be solved in less time complexity?


Solution

  • You can use a recurrence relation, and implement that with dynamic programming.

    If we consider a suffix of 𝑠, starting at index 𝑖 and a suffix of 𝑡, starting at index 𝑗 (let's use the notation 𝑠[𝑖:] and 𝑡[𝑗:] for those suffixes) then we have a smaller problem to solve, namely how many subsequences of the first suffix are greater than the second suffix. We can use this result for the bigger problem.

    If we want to know the solution for 𝑠[𝑖:] and 𝑡[𝑗:], then we have a few scenarios:

    • If 𝑠[𝑖:] is empty (i.e. 𝑖 ≥ 𝑚), then there are 0 subsequences.

    • Otherwise, we can split the count of subsequences in two groups:

      1. Subsequences that exclude 𝑠[𝑖]

        This count is equal to the solution for 𝑠[𝑖+1:] and 𝑡[𝑗:] (we just removed 𝑠[𝑖] from the input)

      2. Subsequences that include 𝑠[𝑖]

        • If 𝑗 ≥ 𝑛 or 𝑠[𝑖] > 𝑡[𝑗], then the following characters of 𝑡 don't matter anymore, and we can freely chose which characters of 𝑠[𝑖+1:] to include or not. This represents 2𝑚-1-𝑖 possible subsequences, all of which start with 𝑠[𝑖].

        • When 𝑠[𝑖] = 𝑡[𝑗], then the count of subsequences is given by the solution for 𝑠[𝑖+1:] and 𝑡[𝑗+1:]

        • Otherwise (when 𝑠[𝑖] < 𝑡[𝑗]), we made an invalid choice, and so there are 0 subsequences to count for this scenario.

    More formally, define 𝑚 as the length of 𝑠, 𝑛 as the length of 𝑡 and 𝑇𝑖, 𝑗 as the number of subsequences of 𝑠[𝑖:] that are greater than 𝑡[𝑗:]. Then:

    • 𝑇𝑖, 𝑗 = 0, when 𝑖 ≥ 𝑚

    • 𝑇𝑖, 𝑗 = 𝑇𝑖+1, 𝑗 + 2𝑚-1-𝑖, when otherwise 𝑗 ≥ 𝑛 or 𝑠[𝑖] > 𝑡[𝑗]

    • 𝑇𝑖, 𝑗 = 𝑇𝑖+1, 𝑗 + 𝑇𝑖+1, 𝑗+1, when otherwise 𝑠[𝑖] = 𝑡[𝑗]

    • 𝑇𝑖, 𝑗 = 0, otherwise

    In the end, we need the value for 𝑇0, 0

    To implement this, we could use a bottom-up approach, starting with an empty suffix of 𝑠 (i.e. 𝑖 = 𝑚), and then grow that suffix (decreasing 𝑖). For each suffix consider 𝑗 decreasing from 𝑛 to 0. As 𝑇𝑖, 𝑗 only depends directly on 𝑇𝑖+1, 𝑗 and 𝑇𝑖+1, 𝑗+1, we don't actually need to store that whole matrix 𝑇, but can suffice with keeping two consecutive rows of that matrix in memory only.

    Here is an implementation:

        public static int solve(String s, String t) {
            int m = s.length();
            int n = t.length();
            
            int[] dp = new int[n+1];
            int[] dpPrev;
            
            for (int i = m - 1; i >= 0; i--) { // Grow the suffix of s that is considered
                // Take a copy to have a reference to previous results (for smaller s suffix)
                dpPrev = dp.clone(); 
                // There are 2^(length of s[i:]) - 1 non-empty subsequences 
                //     when t[j:] is empty (j == n):
                dp[n] = 2 * dp[n] + 1;
                // Add the cases where s[i] is included in the subsequences
                for (int j = n - 1; j >= 0; j--) {
                    int cmp = Character.compare(s.charAt(i), t.charAt(j));
                           // Add the count of all subsequences of s[s+1:] (+1 for empty one)
                    dp[j] += cmp >  0 ? dpPrev[n] + 1 
                           // Add the count of subsequences of s[i+1:] greater than t[j+1:]
                           : cmp == 0 ? dpPrev[j+1]
                           : 0;
                }
            } 
            return dp[0];
        }
    

    NB: your code had a constant 1_000_000_007, which was not mentioned in your question. I assume it will be no problem for you to incorporate the requirement that relates to that constant. I preferred to keep it out so to focus on the question.