i try to build an algorithm to find a forbidden word which has 4 characters. My assignment is to call the function sendMessage() as rarely as possible.
here are some details about the sendMessage() Function: bool sendMessage(std::string message) The message can be a maximum of 10,000 characters long and consist only of lowercase letters from 'a' to 'z'. The function returns true if the message has been censored; otherwise, it returns false.
The way i wanted to solve this problem is that i just create random words with 4 characters and put them all together in a string with 10000 characters (2500 words * 4 characters each). After that i look if the sendMessage() Function returns true and if that is the case, i apply something like a Tree Search where i divide the message into 2 parts, see if it returns true on one of these, if yes devide again into 2 parts and so on.
My problem now is that my program doesn't find the forbidden word. Here is my Code:
#include "zensur.h"
#include <iostream>
#include <string>
#include <utility> // for std::pair
std::pair<std::string, std::string> splitString(const std::string& inputString) {
// Determine the length of the string
size_t length = inputString.length();
// Check if the length is even or odd
if (length % 2 == 0) {
// If the length is even, split the string into two halves
size_t middle = length / 2;
std::string part1 = inputString.substr(0, middle);
std::string part2 = inputString.substr(middle);
return std::make_pair(part1, part2);
}
else {
// If the length is odd, split the string so that the first part has an additional character
size_t middle = length / 2;
std::string part1 = inputString.substr(0, middle + 1);
std::string part2 = inputString.substr(middle + 1);
return std::make_pair(part1, part2);
}
}
std::string findBannedWord() {
const int wordLength = 4;
std::string concatenatedWords;
for (char a = 'a'; a <= 'z'; ++a) {
for (char b = 'a'; b <= 'z'; ++b) {
for (char c = 'a'; c <= 'z'; ++c) {
for (char d = 'a'; d <= 'z'; ++d) {
// make a potential word
std::string potentialWord = { a, b, c, d };
if (concatenatedWords.length() < (10000 - wordLength)) {
// put 2500 potential words together to a concatenated Word
concatenatedWords += potentialWord;
}
// if the concatenated Word is 10000 long (2500*4) then look if the forbidden word is in it
if (concatenatedWords.length() == 10000) {
if (sendMessage(concatenatedWords)) {
// if it is in it then apply something like a tree structure to find it.
while (true) {
// split it into 2 substrings
std::pair<std::string, std::string> innerParts = splitString(concatenatedWords);
// if it is in the first substring
if (sendMessage(innerParts.first)) {
concatenatedWords = innerParts.first;
continue;
}
// if it is in the second substring
else if (sendMessage(innerParts.second)) {
concatenatedWords = innerParts.second;
continue;
}
// if it is in none of those substrings then look at the end of first substring and beginning of second substring combined
else {
size_t length = innerParts.first.length();
std::string endString = innerParts.first.substr(length - 4);
std::string startString = innerParts.second.substr(0, 4);
// put the end of first substring and beginning of second
// substring together
std::string realString = endString + startString;
// go through all 5 possibilitys
for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {
std::string substring = realString.substr(i, wordLength);
// if we found the forbidden word return it
if (sendMessage(substring)) {
return substring;
}
}
}
}
}
// if forbidden word is not in there clear concatenatedWords and try again
concatenatedWords = "";
}
}
}
}
}
return ""; // return empty string if word is not found
}
I would be very grateful if someone can help me find the mistake in my code. I personally think my error is somewhere in this section:
// if it is in none of those substrings then look at the end of first substring and beginning of second substring combined
else {
size_t length = innerParts.first.length();
std::string endString = innerParts.first.substr(length - 4);
std::string startString = innerParts.second.substr(0, 4);
// put the end of first substring and beginning of second substring together
std::string realString = endString + startString;
// go through all 5 possibilitys
for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {
std::string substring = realString.substr(i, wordLength);
/ if we found the forbidden word return it
if (sendMessage(substring)) { /
return substring;
}
}
You've implemented binary search, which is a good idea.
There are a few issues though:
The condition to extend concatenatedWords
with four more letters is not good:
if (concatenatedWords.length() < (10000 - wordLength)) {
This means that when the length of concatenatedWords
is 9996 no word will be added to it, and thus the following condition will never be true:
else if (concatenatedWords.length() == 10000) {
...meaning that the binary search will never be performed.
The correction is that you should also add the word when there is equality in that first if
condition. So do:
if (concatenatedWords.length() <= (10000 - wordLength)) { // fix
In the binary search part, in the special case where the word is neither found in the left nor the right partition, you have this expression:
innerParts.first.substr(length - 4)
This expression will raise an error when innerParts.first
has a size that is less than 4, which is well possible. I could reproduce it by testing with the banned word "zzyx". This should be corrected to:
innerParts.first.substr(length > 3 ? length - 4 : 0)
but see next point.
Not a problem, but in that special case where the banned word is being cut by the partitioning, you only need to treat 3 cases, not 5. So instead of length - 4
, you only need length - 3
, and similarly, only substr(0, 3)
and not substr(0, 4)
.
Related to the above: you have the constant wordLength
, but you don't use it consistently, like for instance in determining the size of the substrings in the above mentioned case.
Since the number of 4 letter words is not a multiple of 10000, you might end up* with a remainder when exiting the four nested loops without success. It would then be necessary to find the banned word in that remaining string. So I would replace the return "";
at the very end with another execution of the binary search algorithm. That brings me to the following point.
* I am not entirely sure this case will actually happen, because the way you build the string, there will be several occurrences of the same word. For instance, in the very first chunk, the word aaaa
will occur at index 0, index 1 and index 2, because that chunk starts with aaaaaaabaaac
. Similarly zzzz
will appear in the concatenation of yzzz
and zaaa
.
Not a problem, but the code is rather deeply nested. This should be an opportunity to separate some code into a new function. The obvious candidate for this is the binary search algorithm. This will also make it easy to call it again at the very end in case there was no match in the chunks of 10000 characters.
This leads to the following code:
const int wordLength = 4; // Used in the two functions below
// Isolate the binary search logic in a separate function
std::string binarySearch(std::string concatenatedWords) {
while (true) {
// split it into 2 substrings
std::pair<std::string, std::string> innerParts = splitString(concatenatedWords);
// if it is in the first substring
if (sendMessage(innerParts.first)) {
concatenatedWords = innerParts.first;
continue;
}
// if it is in the second substring
if (sendMessage(innerParts.second)) {
concatenatedWords = innerParts.second;
continue;
}
else {
size_t length = innerParts.first.length();
// Fix to avoid a negative argument in substr.
// Fix to use the wordLength constant.
std::string endString = innerParts.first.substr(length >= wordLength
? length - wordLength + 1 : 0);
std::string startString = innerParts.second.substr(0, wordLength - 1);
std::string realString = endString + startString;
// Fix to use the wordLength constant
for (int i = 0; i < ((realString.length() - wordLength) + 1); ++i) {
std::string substring = realString.substr(i, 4);
if (sendMessage(substring)) {
return substring;
}
}
}
}
return "";
}
std::string findBannedWord() {
std::string concatenatedWords;
for (char a = 'a'; a <= 'z'; ++a) {
for (char b = 'a'; b <= 'z'; ++b) {
for (char c = 'a'; c <= 'z'; ++c) {
for (char d = 'a'; d <= 'z'; ++d) {
std::string potentialWord = { a, b, c, d };
// Fixed comparison:
if (concatenatedWords.length() <= (10000 - wordLength)) {
concatenatedWords += potentialWord;
}
else if (concatenatedWords.length() == 10000) {
if (sendMessage(concatenatedWords)) {
return binarySearch(concatenatedWords);
}
concatenatedWords = "";
}
}
}
}
}
// Also look in the "left over"
return binarySearch(concatenatedWords);
}
There is room for optimisation because it is possible to use shorter strings. Potentially at every single index you could have a new 4-letter word starting. Almost every 4-letter word now appears multiple times in your concatenated string, and this is a waste of space, leading to a few more calls of the sendMessage
function in the worst case than might be strictly necessary.