Let's imagine, we can build a statistics table, how much each word is used in some English text or book. We can gather statistics for each text/book in library. What is the simplest way to compare these statistics with each other? How can we find group/cluster of texts with very statistically similar lexicon?
First, you'd need to normalize the lexicon (i.e ensure that both lexicons have the same vocabulary).
Then you could use a similarity metric like the Hellenger distance or the cosine similarity to compare the two lexicons.
It may also be a good idea to look into machine learning packages such as Weka.
This book is an excellent source for machine learning and you may find it useful.