Search code examples
cryptographyfrequency-analysisfrequency-distribution

Cryptography. English "normal text"?


I was asked to make a software that will encrypt and decrypt a "normal English" text based on letter frequencies.

The question is where do I find some text samples where the official frequencies will match?

So far, I have tried "War and Peace" by Lev Tolstoy, it didn't work well..

LE: I don't need just a list of words, I need a text sample to make some processing.
LE2: The goal is to guess 20 from 26 in a 2000 characters long text.


Solution

  • You're searching for English text corpora, e.g. http://faculty.washington.edu/ebender/corpora/corpora.html#modern. Out of what's listed there, I know that Project Gutenberg is free; many of the others might not be.

    I'm not sure what you mean by the official frequencies -- the point of the frequencies is to match what you find in the wild, and if they don't, that's the frequency table's problem.