So I've been trying to solve the word break Dynamic Programming problem, which basically means that given a dictionary of strings and a string, see if the words in the dictionary can be combined to form the string. For example, given word "applepenapple" and dictionary ["apple","pen"] it should return true.
I have a working java solution, but I'm trying to improve my C++ skills. My problem is even though my code looks extremely similar to the working solution in Java, I am failing a niche test case and I can't figure out why.
C++ code:
bool wordBreak(string s, vector<string> &wordDict) {
vector<int> bArr(s.length(), -1);
unordered_set<string> set(wordDict.begin(), wordDict.end());
return wordBreak(s, bArr, 0, set);
}
bool wordBreak(string s, vector<int> &bArr, int start, unordered_set<string> &set) {
if (start == s.length())
return true;
//If we have a memoized solution to this problem, avoid recurion
if (bArr[start] != -1)
return (bArr[start] == 1);
for (int end = start + 1; end <= s.length(); end++) {
if (set.count(s.substr(start, end)) && wordBreak(s, bArr, end, set)) {
bArr[start] = 1;
return bArr[start] == 1;
}
}
bArr[start] = 0;
return false;
}
Working code using java:
public boolean wordBreak(String s, List<String> wordDict) {
Integer[] memo =new Integer[s.length()];
Arrays.fill(memo,-1);
return word_Break(s, new HashSet(wordDict), 0, memo);
}
public boolean word_Break(String s, Set<String> wordDict, int start, Integer[] memo) {
if (start == s.length()) {
return true;
}
if (memo[start] != -1) {
return memo[start]==1;
}
for (int end = start + 1; end <= s.length(); end++) {
if (wordDict.contains(s.substring(start, end)) && word_Break(s, wordDict, end, memo)) {
memo[start] = 1;
return memo[start] == 1;
}
}
memo[start] = 0;
return false;
}
The C++ code is returning false for "applepenapple" with dictionary ["apple","pen"] and I don't know why since the java return true which is correct. The only major difference (I think) between the two solutions is that my C++ uses a vector instead of a native array in the java code. Initially I thought it might have to do with C++ using automatic storage (stack) vs the free store (heap) which is why I used the vector instead of a C-style array to avoid memory management because of RAII. Despite this change, the bug persists. There is an easier solution avoiding recursion altogether, but I am very curious as to why the C++ is returning different output than the java.
I see a potential problem. From java.lang.String
Javadoc (emphasis mine):
public String substring(int beginIndex, int endIndex)
Returns a new string that is a substring of this string. The substring begins at the specified beginIndex and extends to the character at index endIndex - 1. Thus the length of the substring is endIndex-beginIndex.
Examples:
"hamburger".substring(4, 8) returns "urge" "smiles".substring(1, 5) returns "mile"
Parameters:
beginIndex - the beginning index, inclusive.
endIndex - the ending index, exclusive.
From cppreference.com documentation on strings:
basic_string substr( size_type pos = 0, size_type count = npos ) const;
Returns a substring [pos, pos+count). If the requested substring extends past the end of the string, or if count == npos, the returned substring is [pos, size()).
Parameters
pos - position of the first character to include
count - length of the substring
That is, in Java you should pass an index as second parameter to String.substring(...)
, but in C++ you should pass a length to basic_string::substr(...)
. However, you are doing:
s.substr(start, end)
and
s.substring(start, end)
in both cases.
Maybe adjusting the C++ invocation to
s.substr(start, end - start)
will work?