Search code examples
pythonjsonexcelpandaslocalization

Localization from Excel to JSON using Python


I have JSON file with different languages. I have to add more value that are read from Excel file. For example, this is Excel table:

-------------------------------
| EN        | DE         | RU |
------------+------------+-----
| Ball      | Ball       | AA |
| Snow      | Schnee     | BB |
| Elephant  | Elephant   | CC |
| Woman     | Frau       | DD |
| Potato    | Kartoffeln | EE |
| Tomato    | F          | FF |
| Carrot    | G          | GG |
-------------------------------

JSON file in which I should add these values:

{
    "en": {
        "Ball": "Ball",
        "Snow": "Snow"
    },
    "de": {
        "Ball": "Ball",
        "Snow": "Schnee"
    },
    "ru": {
        "Ball": "AA",
        "Snow": "BB"
    }
}

Note: en is EN in Excel. JSON keys must be exactly the same as values in English. Tried this, but does not work:

# Importing dependencies 
import pandas
import json

# Reading xlsx into pandas dataframe
df = pandas.read_excel('Translations.xlsx')
df.iloc[0] = df.iloc[0].str.lower()
jsonfile = df.set_index('en', drop=False).to_json(indent=2)
# Encoding/decoding a Dataframe using 'columns' formatted JSON
jsonfile = df.to_json(orient='columns')

# Print out the result
print('Excel Sheet to JSON:\n', jsonfile)

# Make the string into a list to be able to input in to a JSON-file
json_dict = json.loads(jsonfile)

# write from and file to write to
with open('Localization.json', 'w', encoding='utf-8') as json_file:
    json.dump(json_dict, json_file)

Solution

  • To convert the dataframe to JSON, you can set en to be the index (and still keep the en column) and call to_json:

    json = df.set_index('en', drop=False).to_json(indent=2)
    

    Output:

    >>> print(json)
    {
      "en":{
        "Ball":"Ball",
        "Snow":"Snow",
        "Elephant":"Elephant",
        "Woman":"Woman",
        "Potato":"Potato",
        "Tomato":"Tomato",
        "Carrot":"Carrot"
      },
      "de":{
        "Ball":"Ball",
        "Snow":"Schnee",
        "Elephant":"Elephant",
        "Woman":"Frau",
        "Potato":"Kartoffeln",
        "Tomato":"F",
        "Carrot":"G"
      },
      "ru":{
        "Ball":"AA",
        "Snow":"BB",
        "Elephant":"CC",
        "Woman":"DD",
        "Potato":"EE",
        "Tomato":"FF",
        "Carrot":"GG"
      }
    }
    

    The whole script would probably be something like this:

    import json
    
    # Load old JSON from a file.
    with open('old_json.json') as f:
        old_json = json.load(f)
    
    # Load new data from spreadsheet.
    new_data = pd.read_excel('...')
    
    # Create dataframe from old JSON.
    old_data = pd.DataFrame(old_json)
    
    # Convert columns of both dataframes to lowercase.
    new_data.columns = new_data.columns.astype(str).str.lower()
    old_data.columns = old_data.columns.astype(str).str.lower()
    
    # Append new data to old data and convert joined data to JSON.
    new_json = pd.concat([old_data, new_data.set_index('en', drop=False)]).to_dict()
    
    # Save new JSON to a file.
    with open('new_json.json', 'w') as f:
        json.dump(new_json, f)