Search code examples
c++encryptioncounterfrequency-analysis

Count sequences of letters in a string - C++


I'm working on a simple substitution-cipher decoder. I'm using frequency analysis to decrypt the ciphertext. Just looking at the frequency of unique letter isn't enough. I need to look at the occurrences of 2-letter sequences (maybe 3-letter sequences).

My code for counting the occurrences of each letter is below

int counterRaw[256][2] = {0};
    for(int i = 0; i <inputString.length(); i++)
        counterRaw[inputString[i]][1]++;

int counterLetter[26][2] = {0};
    for(int i = 0 ; i<26 ; i++){
        counterLetter[i][0] = 'A'+i;
        counterLetter[i][1] = counterRaw['A'+i][1];

As you can see very simple yet effective !

But I don't know how to achieve a 2-letter sequence counter, do you have any idea which could help me to code this ?

Thanks !

EDIT : As an example Given AZAZ RTYU JKLM I want my program to output :

AZ : 2
ZA : 1
ZR : 1
RT : 1
...

Solution

  • Something like the following would do the trick, though you'd have to do some jiggery pokery to make it suit your own needs.

    #include <iostream>
    #include <map>
    #include <string>
    
    int main ()
    {
        std::string message("some string that you will probably get from some encrypted file");
        std::map<std::string,int> occurences;
    
        std::string seq("  ");
        for(int i = 1; i < message.length() - 1; i++)
        {
            seq[0] = message[i-1];
            seq[1] = message[i];
    
            //ignore spaces
            if (seq.compare(0,1, " ") && seq.compare(1,1, " "))
            {
                occurences[seq]++;
            }
        }
    
        //let's have a look ...
        for(auto iter = occurences.begin(); iter != occurences.end(); ++iter)
        {
            std::cout << iter->first << "   " << iter->second << "\n";
        }
        return 0;
    }
    

    output:

    ab   1
    at   1
    ba   1
    bl   1
    cr   1
    ed   1
    en   1 
    et   1
    fi   1
    fr   1
    ge   1
    ha   1
    il   2
    in   1
    ll   1
    ly   1
    me   2
    nc   1
    ng   1
    ob   1
    om   3
    ou   1
    pr   1
    pt   1
    ri   1
    ro   2
    ry   1
    so   2
    st   1
    te   1
    th   1
    tr   1
    wi   1
    yo   1
    yp   1