Search code examples
c#arraysstringfrequency-analysisalphabet

Trying to replace letters in a string with the corresponding letter in the alphabet with a similar frequency


As the title states I am trying to replace letters in a specific string with the highest frequency with the corresponding letter in the alphabet.

For example, if the string has the most Ds in it then i would replace all the Ds with an E as that is the most common letter in the alphabet, i would then continue this process going down the letter frequencies...

So i have had a shot but my output is completely wrong.

Im completely new to progroqamming so im sorry if it all disgusts you, but id still like to do it in the format i have already been following.

I have linked my code as follows, i have done it in a few separate methods, i was wondering if anyone can spot the problem i am having.

I believe it is replacing the wrong letter but i really have no idea, i have only done a simple ceasar cipher before so this isnt a large step but i really cant get my head round whats going wrong.

Oh and please ignore variable names etc they are just place holders:

public class Decode
{
    public static void doDecode()
    {
        string decoding = File.ReadAllText(@"thing.txt", Encoding.Default);
        string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        int counter = 0;
        int amount = 0;
        int[] letterAmounts = new int[26];

        decoding = decoding.Replace(Environment.NewLine, "");
        decoding = decoding.Replace(" ", "");

        foreach (char k in alphabet)
        {
            amount = Advanced.Adv(decoding, k);
            letterAmounts[counter] = amount;
            counter++;
        }
        File.WriteAllText(@"stuff.txt", Change.doChange(decoding, letterAmounts));
        System.Diagnostics.Process.Start(@"stuff.txt");
    }
}

So this simply calls the other classes and assigns the numbers found to an array

public class Advanced
{
    public static int Adv(string test, char c)
    {
        int count = 0;
        foreach (char x in test)
        {
            if (x == c)
            {
                count = count + 1;
            }
        }

        return count;
    }
}

This is called previously and simply counts the amount there is of a letter

public class Change
{
    public static string doChange(string test, int[] letterAmounts)
    {
        string frequency = "ETAOINSHRDLCUMWFGYPBVKJXQZ";
        char[] mostFrequent = frequency.ToCharArray();
        string alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
        char[] abc = alphabet.ToCharArray();
        int most = 0;
        int position = 0;
        for (int tester = 0; tester < 26; tester++)
        {
            most = letterAmounts.Max();
            position = Array.IndexOf(letterAmounts, most);
            test = test.Replace(abc[position], mostFrequent[tester]);
            letterAmounts[position] = 0;
        }
        return test;
    }
}

This is where i believe the problems lay but i cannot get my head around why, again i know its messy but any help is deeply appreciated.


Solution

  • It looks like this section is doing something strange:

    for (int tester = 0; tester < 26; tester++)
    {
        most = letterAmounts.Max();
        position = Array.IndexOf(letterAmounts, most);
        test = test.Replace(abc[position], mostFrequent[tester]);
        letterAmounts[position] = 0;
    }
    

    So, let's run through an example string of "I AM BOB". This will get converted to "IAMBOB" and your letterAmounts will result in 1,1,1,2,1,2. Your above for loop will then do the following:

    most = 2;
    position = 3; //IndexOf reports the zero-based index.
    test = test.Replace(abc[3], mostFrequent[0]);
    letterAmounts[3] = 0;
    

    On the first loop through it will replace any letter 'D's with 'E's, of which there are none. On the second loop through you would get:

    most = 2; //second B.
    position = 5; 
    test = test.Replace(abc[5], mostFrequent[1]);
    letterAmounts[5] = 0;
    

    This time you will be replacing 'E's with 'T's. Basically, you're not replacing the letter you think you are. Also, this nicely highlights that you could end up replacing previously replaced letters with new ones (in this case you had replaced D's with E's in the first loop, but in the second loop those E's will now be replaced with T's.

    The first error seems to be using the index of the maximum value in letterAmounts to then find the letter in the 'abc' array. These won't necessarily correspond with each other. Presumably what you want is actually to replace the letter with the most frequent, so B with E on the first loop? If that's the case you will need to create a List> to enable you to story both the letter and the number of occurrences. A tuple will also allow you to have duplicate entries (unlike a dictionary), which may well occur as per the example of the letter B in this example.

    Then return the letter from the list of tuples and use that to go into your abc[] part of the replace. However, you will still need to figure out how you want to proceed with replacing letters that have already been replaced. Should this occur for example?