Search code examples
pythonalgorithmword-break

maximum word split


Given a string s and a dictionary of valid words d, determine the largest number of valid words the string can be split up into.

I tried solving this problem with the code below but it is not giving me the answer I am looking for.

def word_split_dp(s):
    n = len(s)
    ans = [0]*n
    # base case
    ans[0] = 0
    for i in range(1, len(s)):
        ans[i] = float('-inf')
        for j in range(1, i-1):
            ans[i]= max(ans[i], ans[i-j] + wordcheck(s,i,j))
   
    print(ans)
    return ans[n-2]
       
        
       
        
       
        
def wordcheck(s,i,j):
    
    for word in dict:
        start = i-j+1
        if(s[start:i] == word):
            return 1
        else:
            return float('-inf')
        

print(word_split_dp(s))

Solution

  • There are a few issues in your attempt. Starting from the top:

    • Assuming that ans[i] represents the maximum number of partitions of the substring s[0:i] into valid words, you'll need to make this list one entry longer, so that there is an ans[n], which will eventually contain the answer, i.e. the maximum number of partitions in s[0:n] which is s. So change:

      ans = [0]*n
      

      to

      ans = [0]*(n+1)
      
    • Because of the reasoning given in the first bullet point, the final return should be return ans[n] instead of return ans[n-2].

    • Again because of the reasoning given in the first billet point, the outer loop should make one more iteration, so change:

      for i in range(1, len(s)):
      

      to

      for i in range(1, len(s)+1):
      
    • Assuming that j represents the size of the substring to test, the inner loop should allow i-j to eventually reach back to 0, so j should be able to reach the value i. That means the inner loop should make two more iterations. So change:

      for j in range(1, i-1):
      

      to

      for j in range(1, i+1):
      
    • In wordcheck, the slice from s should start at i-j, as j represents the size of the slice. So change:

      start = i-j+1
      

      to

      start = i-j
      
    • In wordcheck, the loop will always exit in its first iteration. It cannot loop more as it always executes a return. You should not exit the loop when there is no success. The return float('-inf') should only be made when all words have been tried. So remove:

          else:
              return float('-inf')
      

      and instead add at after the loop:

      return float('-inf')
      

    Those are the issues. The code with all those corrections:

    def word_split_dp(s):
        n = len(s)
        ans = [0]*(n+1)  #correction
        # base case
        ans[0] = 0
        for i in range(1, len(s) + 1): # correction
            ans[i] = float('-inf')
            for j in range(1, i + 1): # correction
                ans[i] = max(ans[i], ans[i-j] + wordcheck(s,i,j))
       
        print(ans)
        return ans[n] # correction
            
    def wordcheck(s,i,j):
        words=("war","month","on","the","heat","eat","he","arm","at")
        for word in words:
            start = i-j # correction
            if s[start:i] == word:
                return 1
            # don't interrupt loop otherwise
        return float('-inf')
            
    s="warmontheheat"
    print(word_split_dp(s))  # -inf
    s="warontheheat"
    print(word_split_dp(s))  # 5