Search code examples
pythonpandasdataframeloopsiso-639

How to convert languages ISO639-1 codes to language names in python?


I have the following Pandas series:

>>> df.original_language.value_counts()
en    32269
fr     2438
it     1529
ja     1350
de     1080
      ...  
la        1
jv        1
sm        1
gl        1
mt        1
Name: original_language, Length: 92, dtype: int64
4

I want to convert these language codes into their original names, for example

en >> English

ar >> Arabic

I looked up this question but it didn't help. If there are any packages required, please provide a source of how to install them using pip if possible.


Solution

  • Use iso-639 module ->

    #pip install iso-639
    from iso639 import languages
    df['lang'] = df['lang'].apply(lambda x: languages.get(alpha2=x).name)
    

    output -

           lang  count
    0   English  32269
    1    French   2438
    2   Italian   1529
    3  Japanese   1350
    4    German   1080
    5     Latin      1
    6  Javanese      1
    7    Samoan      1
    8  Galician      1
    9   Maltese      1
    
    

    If you wanna convert codes in your original df, then use -

    from iso639 import languages
    df['original_language'] = df['original_language'].apply(lambda x: languages.get(alpha2=x).name)