Markov chains are a (almost standard) way to generate random gibberish which looks intelligent to untrained eye. How would you go about identifying markov generated text from human written text.
It would be awesome if the resources you point to are Python friendly.
You could use a "brute force" approach, whereby you compare the generated language to data collected on n-grams of higher order than the Markov model that generated it.
i.e. If the language was generated with a 2nd order Markov model, up to 3-grams are going to have the correct frequencies, but 4-grams probably won't.
You can get up to 5-gram frequencies from Google's public n-gram dataset. It's huge though - 24G compressed - you need to get it by post on DVDs from LDC.
EDIT: Added some implementation details
The n-grams have already been counted, so you just need to store the counts (or frequencies) in a way that's quick to search. A properly indexed database, or perhaps a Lucene index should work.
Given a piece of text, scan across it and look up the frequency of each 5-gram in your database, and see where it ranks compared to other 5-grams that start with the same 4 words.
Practically, a bigger obstacle might be the licensing terms of the dataset. Using it for a commercial app might be prohibited.