Search code examples
pythonnlp

How to determine the language of a piece of text?


I want to get this:

Input text: "ру́сский язы́к"
Output text: "Russian" 

Input text: "中文"
Output text: "Chinese" 

Input text: "にほんご"
Output text: "Japanese" 

Input text: "العَرَبِيَّة"
Output text: "Arabic"

How can I do it in python?


Solution

  • Have you had a look at langdetect?

    from langdetect import detect
    
    lang = detect("Ein, zwei, drei, vier")
    
    print lang
    #output: de