Search code examples
javaemailmime

Detecting language of email body


I need to implement an automated email reply system.

Here for the system i need to check the incoming emails and reply the email in the same language in which the email was received.

How can i do such a thing , please suggest some ideas? Thanks in advance.


Appending one more query:

  1. In the email headers there is one more header of the kind:

    Content-Type: text/plain; charset=ISO-8859-1
    

How good it can prove in determining the language of the email body?

e.g (all headers taken out from gmail):

  1. for Chinese subject and body Content-Type: text/plain; charset=GB2312

  2. for Korean subject and body Content-Type: text/plain; charset=EUC-KR

  3. for french/italian subject and body Content-Type: text/html; charset=ISO-8859-1

Also is there any list somebody can direct me that have mappings defined for language to charset?

Thanks in advance


Solution

  • Google translate can guess the language of a sample text. Have a look at the API, it could be a solution for your problem (if you're connected to the internet anyway and don't care, sending fragments of mails to google servers...).

    For offline evaluation I found the Java Text Categorizing Library.