I am trying to make a program which can help you to break a cipher text without knowing the plain text and the key.
I want probable plain text at the output which gives the closest statistical values and a set of probable candidates keys
I started doing the frequency analysis,completed it. It helped me in telling the occurrence of each alphabet, but I have no idea how will I generate keys from that.
class Program
{
static void Main()
{
// Array to store frequencies.
int[] c = new int[(int)char.MaxValue];
// Read entire text file.
string s = File.ReadAllText("text.txt");
// Iterate over each character.
foreach (char t in s)
{
// Increment table.
c[(int)t]++;
}
// Write all letters found.
for (int i = 0; i < (int)char.MaxValue; i++)
{
if (c[i] > 0 &&
char.IsLetterOrDigit((char)i))
{
Console.WriteLine("Letter: {0} Frequency: {1}",
(char)i,
c[i]);
}
}
}
}
A Caesar cipher just replaces each plain text character with one a fixed number of places away down the alphabet. Assuming no casing, and English text, then it is trivial to produce all possible 26 decryptions and just pick out the correct one by eye.
For a substitution cipher you need to generalise your solution. A simplified method is to do a frequency count as you've suggested, and sort characters in descending order of frequency. Map those to the letters (again for English) ETAOINSRHOLUCMFYWGPBVKXQJZ (so for example assume the most frequent character represents an E, the next most frequent a T and so on). Use the mapping to do the decryption. The more cipher text you have the better the decryption will be. It is unlikely to be completely accurate but will give you enough information to fill in the gaps manually.
A more sophisticated solution might generate the mapping from the frequency distribution rather than just the sort order, and use known facts about the language e.g. Q is usually followed by U. You can get really fancy and check digraph and and trigram frequencies: http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/english-letter-frequencies/