Search code examples
javac++recursiondynamic-programmingmemoization

C++ vs Java Memoization Discrepancy


So I've been trying to solve the word break Dynamic Programming problem, which basically means that given a dictionary of strings and a string, see if the words in the dictionary can be combined to form the string. For example, given word "applepenapple" and dictionary ["apple","pen"] it should return true.

I have a working java solution, but I'm trying to improve my C++ skills. My problem is even though my code looks extremely similar to the working solution in Java, I am failing a niche test case and I can't figure out why.

C++ code:

bool wordBreak(string s, vector<string> &wordDict) {
    vector<int> bArr(s.length(), -1);
    unordered_set<string> set(wordDict.begin(), wordDict.end());
    return wordBreak(s, bArr, 0, set);
}

bool wordBreak(string s, vector<int> &bArr, int start, unordered_set<string> &set) {
    if (start == s.length())
        return true;
    //If we have a memoized solution to this problem, avoid recurion
    if (bArr[start] != -1)
        return (bArr[start] == 1);
    for (int end = start + 1; end <= s.length(); end++) {
        if (set.count(s.substr(start, end)) && wordBreak(s, bArr, end, set)) {
            bArr[start] = 1;
            return bArr[start] == 1;
        }
    }
    bArr[start] = 0;
    return false;
}

Working code using java:

public boolean wordBreak(String s, List<String> wordDict) {
    Integer[] memo =new Integer[s.length()];
    Arrays.fill(memo,-1);
    return word_Break(s, new HashSet(wordDict), 0, memo);
}

public boolean word_Break(String s, Set<String> wordDict, int start, Integer[] memo) {
    if (start == s.length()) {
        return true;
    }
    if (memo[start] != -1) {
        return memo[start]==1;
    }
    for (int end = start + 1; end <= s.length(); end++) {
        if (wordDict.contains(s.substring(start, end)) && word_Break(s, wordDict, end, memo)) {
            memo[start] = 1;
            return memo[start] == 1;
        }
    }
    memo[start] = 0;
    return false;
}

The C++ code is returning false for "applepenapple" with dictionary ["apple","pen"] and I don't know why since the java return true which is correct. The only major difference (I think) between the two solutions is that my C++ uses a vector instead of a native array in the java code. Initially I thought it might have to do with C++ using automatic storage (stack) vs the free store (heap) which is why I used the vector instead of a C-style array to avoid memory management because of RAII. Despite this change, the bug persists. There is an easier solution avoiding recursion altogether, but I am very curious as to why the C++ is returning different output than the java.


Solution

  • I see a potential problem. From java.lang.String Javadoc (emphasis mine):

    public String substring(int beginIndex, int endIndex)

    Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.

    Examples:

    "hamburger".substring(4, 8) returns "urge"
    "smiles".substring(1, 5) returns "mile"
    

    Parameters:

    beginIndex - the beginning index, inclusive.

    endIndex - the ending index, exclusive.

    From cppreference.com documentation on strings:

    basic_string substr( size_type pos = 0, size_type count = npos ) const;

    Returns a substring [pos, pos+count). If the requested substring extends past the end of the string, or if count == npos, the returned substring is [pos, size()).

    Parameters

    pos - position of the first character to include

    count - length of the substring

    That is, in Java you should pass an index as second parameter to String.substring(...), but in C++ you should pass a length to basic_string::substr(...). However, you are doing:

    s.substr(start, end)
    

    and

    s.substring(start, end)
    

    in both cases.

    Maybe adjusting the C++ invocation to

    s.substr(start, end - start)
    

    will work?