Search code examples
c++algorithmsearchtree

Find a specific word algorithm in c++


i try to build an algorithm to find a forbidden word which has 4 characters. My assignment is to call the function sendMessage() as rarely as possible.

here are some details about the sendMessage() Function: bool sendMessage(std::string message) The message can be a maximum of 10,000 characters long and consist only of lowercase letters from 'a' to 'z'. The function returns true if the message has been censored; otherwise, it returns false.

The way i wanted to solve this problem is that i just create random words with 4 characters and put them all together in a string with 10000 characters (2500 words * 4 characters each). After that i look if the sendMessage() Function returns true and if that is the case, i apply something like a Tree Search where i divide the message into 2 parts, see if it returns true on one of these, if yes devide again into 2 parts and so on.

My problem now is that my program doesn't find the forbidden word. Here is my Code:

#include "zensur.h"

#include <iostream>
#include <string>
#include <utility> // for std::pair

std::pair<std::string, std::string> splitString(const std::string& inputString) {
    // Determine the length of the string
    size_t length = inputString.length();

    // Check if the length is even or odd
    if (length % 2 == 0) {
        // If the length is even, split the string into two halves
        size_t middle = length / 2;

        std::string part1 = inputString.substr(0, middle);
        std::string part2 = inputString.substr(middle);

        return std::make_pair(part1, part2);
    }
    else {
        // If the length is odd, split the string so that the first part has an additional character
        size_t middle = length / 2;

        std::string part1 = inputString.substr(0, middle + 1);
        std::string part2 = inputString.substr(middle + 1);

        return std::make_pair(part1, part2);
    }
}

std::string findBannedWord() {
    const int wordLength = 4;
    std::string concatenatedWords;

    for (char a = 'a'; a <= 'z'; ++a) {
        for (char b = 'a'; b <= 'z'; ++b) {
            for (char c = 'a'; c <= 'z'; ++c) {
                for (char d = 'a'; d <= 'z'; ++d) {

                    // make a potential word
                    std::string potentialWord = { a, b, c, d }; 

                    
                    if (concatenatedWords.length() < (10000 - wordLength)) {    
                        // put 2500 potential words together to a concatenated Word
                        concatenatedWords += potentialWord;
                    }

                    // if the concatenated Word is 10000 long (2500*4) then look if the forbidden word is in it
                    if (concatenatedWords.length() == 10000) {  
                        if (sendMessage(concatenatedWords)) {   

                        // if it is in it then apply something like a tree structure to find it.
                            while (true) {
                                // split it into 2 substrings
                                std::pair<std::string, std::string> innerParts = splitString(concatenatedWords);    

                                // if it is in the first substring
                                if (sendMessage(innerParts.first)) {                
                                    concatenatedWords = innerParts.first;
                                    continue;
                                }

                                // if it is in the second substring
                                else if (sendMessage(innerParts.second)) {            
                                    concatenatedWords = innerParts.second;
                                    continue;
                                }

                                // if it is in none of those substrings then look at the end of first substring and beginning of second substring combined
                                else {                                              
                                    size_t length = innerParts.first.length();
                                    std::string endString = innerParts.first.substr(length - 4);
                                    std::string startString = innerParts.second.substr(0, 4);

                                    // put the end of first substring and beginning of second  
                                    // substring together
                                    std::string realString = endString + startString;   
    
                                    // go through all 5 possibilitys
                                    for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {        
                                        std::string substring = realString.substr(i, wordLength);

                                        // if we found the forbidden word return it
                                        if (sendMessage(substring)) {       
                                            return substring;
                                        }
                                    }
                                }
                            }
                        }
                        // if forbidden word is not in there clear concatenatedWords and try again
                        concatenatedWords = ""; 
                    }
                }
            }
        }
    }

    return ""; // return empty string if word is not found
}

I would be very grateful if someone can help me find the mistake in my code. I personally think my error is somewhere in this section:

 // if it is in none of those substrings then look at the end of first substring and beginning of second substring combined
else {                                             
    size_t length = innerParts.first.length();
    std::string endString = innerParts.first.substr(length - 4);
    std::string startString = innerParts.second.substr(0, 4);

    // put the end of first substring and beginning of second substring together
    std::string realString = endString + startString;       

    // go through all 5 possibilitys
    for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {        
        std::string substring = realString.substr(i, wordLength);

        / if we found the forbidden word return it
        if (sendMessage(substring)) {       /
            return substring;
        }
    }

Solution

  • You've implemented binary search, which is a good idea.

    There are a few issues though:

    • The condition to extend concatenatedWords with four more letters is not good:

      if (concatenatedWords.length() < (10000 - wordLength)) {
      

      This means that when the length of concatenatedWords is 9996 no word will be added to it, and thus the following condition will never be true:

      else if (concatenatedWords.length() == 10000) {
      

      ...meaning that the binary search will never be performed.

      The correction is that you should also add the word when there is equality in that first if condition. So do:

      if (concatenatedWords.length() <= (10000 - wordLength)) { // fix
      
    • In the binary search part, in the special case where the word is neither found in the left nor the right partition, you have this expression:

      innerParts.first.substr(length - 4)
      

      This expression will raise an error when innerParts.first has a size that is less than 4, which is well possible. I could reproduce it by testing with the banned word "zzyx". This should be corrected to:

      innerParts.first.substr(length > 3 ? length - 4 : 0)
      

      but see next point.

    • Not a problem, but in that special case where the banned word is being cut by the partitioning, you only need to treat 3 cases, not 5. So instead of length - 4, you only need length - 3, and similarly, only substr(0, 3) and not substr(0, 4).

    • Related to the above: you have the constant wordLength, but you don't use it consistently, like for instance in determining the size of the substrings in the above mentioned case.

    • Since the number of 4 letter words is not a multiple of 10000, you might end up* with a remainder when exiting the four nested loops without success. It would then be necessary to find the banned word in that remaining string. So I would replace the return ""; at the very end with another execution of the binary search algorithm. That brings me to the following point.

      * I am not entirely sure this case will actually happen, because the way you build the string, there will be several occurrences of the same word. For instance, in the very first chunk, the word aaaa will occur at index 0, index 1 and index 2, because that chunk starts with aaaaaaabaaac. Similarly zzzz will appear in the concatenation of yzzz and zaaa.

    • Not a problem, but the code is rather deeply nested. This should be an opportunity to separate some code into a new function. The obvious candidate for this is the binary search algorithm. This will also make it easy to call it again at the very end in case there was no match in the chunks of 10000 characters.

    This leads to the following code:

    const int wordLength = 4; // Used in the two functions below
    
    // Isolate the binary search logic in a separate function
    std::string binarySearch(std::string concatenatedWords) {
        while (true) {
            // split it into 2 substrings
            std::pair<std::string, std::string> innerParts = splitString(concatenatedWords);
            // if it is in the first substring
            if (sendMessage(innerParts.first)) {                
                concatenatedWords = innerParts.first;
                continue;
            }
            // if it is in the second substring
            if (sendMessage(innerParts.second)) {            
                concatenatedWords = innerParts.second;
                continue;
            }
            else {
                size_t length = innerParts.first.length();
                // Fix to avoid a negative argument in substr. 
                // Fix to use the wordLength constant.
                std::string endString = innerParts.first.substr(length >= wordLength
     ? length - wordLength + 1 : 0);
                std::string startString = innerParts.second.substr(0, wordLength - 1);
        
                std::string realString = endString + startString;   
    
                // Fix to use the wordLength constant
                for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {        
                    std::string substring = realString.substr(i, 4);
                    if (sendMessage(substring)) {       
                        return substring;
                    }
                }
            }
        }
        return "";
    }
    
    
    std::string findBannedWord() {
        std::string concatenatedWords;
    
        for (char a = 'a'; a <= 'z'; ++a) {
            for (char b = 'a'; b <= 'z'; ++b) {
                for (char c = 'a'; c <= 'z'; ++c) {
                    for (char d = 'a'; d <= 'z'; ++d) {
                        std::string potentialWord = { a, b, c, d };
                        // Fixed comparison:
                        if (concatenatedWords.length() <= (10000 - wordLength)) {
                            concatenatedWords += potentialWord;
                        }
                        else if (concatenatedWords.length() == 10000) {
                            if (sendMessage(concatenatedWords)) {
                                return binarySearch(concatenatedWords);
                            }
                            concatenatedWords = "";
                        }
                    }
                }
            }
        }
        // Also look in the "left over"
        return binarySearch(concatenatedWords); 
    }
    

    There is room for optimisation because it is possible to use shorter strings. Potentially at every single index you could have a new 4-letter word starting. Almost every 4-letter word now appears multiple times in your concatenated string, and this is a waste of space, leading to a few more calls of the sendMessage function in the worst case than might be strictly necessary.