Search code examples
c++vectorasciifrequency-analysiscaesar-cipher

Caesar Cipher w/Frequency Analysis how to proceed next?


I understand this has been asked before and I somewhat have a grasp on how to compare frequency tables between cipher and English(this is the language I'm assuming its in for my program) but I'm unsure about how to get this into code.

void frequencyUpdate(std::vector< std::vector< std::string> > &file, std::vector<int> &freqArg) {
    for (int itr_1 = 0; itr_1 < file.size(); ++itr_1) {

        for (int itr_2 = 0; itr_2 < file.at(itr_1).size(); ++itr_2) {

            for (int itr_3 = 0; itr_3 < file.at(itr_1).at(itr_2).length(); ++itr_3) {
                file.at(itr_1).at(itr_2).at(itr_3) = toupper(file.at(itr_1).at(itr_2).at(itr_3));

                if (!((int)file.at(itr_1).at(itr_2).at(itr_3) < 65 || (int)file.at(itr_1).at(itr_2).at(itr_3) > 90)) {
                    int temp = (int)file.at(itr_1).at(itr_2).at(itr_3) - 65;
                    freqArg.at(temp) += 1;
                }
            }

        }

    }
}

this is how I get the frequency of a given file that has its contents split into lines and then into words, hence the double vector of strings and using ASCII values of the chars - 65 for indices. The resulting vector of ints that hold frequency is saved.

Now is where I don't knot how to proceed. Should I hardcode in a const std:: vector <int> for the English frequency of letters and then somehow to comparison? How would I compare efficiently rather than simply compare each vector to each other for is possible not an efficient method?

This comparison is for getting an appropriate shift value for caesar cipher shifting to decrypt a text. I don't wanna use brute force and shift one at a time until the text is readable. Any advice on how to approach this? Thanks.


Solution

  • Take your frequency vector and the frequency vector for "typical" English text, and find the cross-correlation.

    The highest values of the cross-correlation correspond to the most likely shift values. At that point you'll need to use each one to decrypt, and see whether the output is sensible (i.e. forms real words and coherent sentences).