Search code examples
artificial-intelligencenlp

How to programmatically determine what language the content of a website is written in


I would like to programmatically determine language that content of a website is written in.

The only thing that comes into my mind is to compare content of the website with some set of words that are common to the particular language, and based on match percentage determine the language.

Are there any better and more robust ways to solve the problem?


Solution

  • Neural Network tutorial with Language classifying example based on average frequencies of the letters http://fann.sourceforge.net/fann_en.pdf