Search code examples
pythonjsonlanguage-detection

How to add datafield (key-value) in json with language detection for single data field


I have weather alert data like

"alerts": [
    {
        "description": "There is a risk of frost (Level 1 of 2).\nMinimum temperature: ~ -2 \u00b0C",
        "end": 1612522800,
        "event": "frost",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1612450800
    },
    {
        "description": "There is a risk of widespread icy surfaces (Level 1 of 3).\ncause: widespread ice formation or traces of snow",
        "end": 1612515600,
        "event": "widespread icy surfaces",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1612450800
    },
    {
        "description": "Es treten Windb\u00f6en mit Geschwindigkeiten um 55 km/h (15m/s, 30kn, Bft 7) aus \u00f6stlicher Richtung auf. In exponierten Lagen muss mit Sturmb\u00f6en bis 65 km/h (18m/s, 35kn, Bft 8) gerechnet werden.",
        "end": 1612587600,
        "event": "WINDB\u00d6EN",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1612522800
    },

Now I want to add to every single alert dict a key-value-pair which contains the detection of language from the 'description' field. I tried that but can't get the right syntax...

import json
from langdetect import detect

with open("kiel.json", 'r') as f:
    data = json.loads(f.read())

data['ADDED_KEY'] = 'ADDED_VALUE'
#'ADDED_KEY' = 'lang' - should be added as a data field to EVERY alert
#'ADDED_VALUE' = 'en' or 'ger' - should be the detected language [via detect()] from data field 'description' of every alert 

with open("kiel.json", 'w') as f:
    f.write(json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')))

Actually I just got Adding at the whole file like:

{
"ADDED_KEY": "ADDED_VALUE",
"alerts": [
    {
        "description": "There is a risk of frost (Level 1 of 2).\nMinimum temperature: ~ -2 \u00b0C",
        "end": 1612522800,
        "event": "frost",
        "sender_name": "DWD / Nationales Warnzentrum Offenbach",
        "start": 1612450800
    },

Can you help me to complete the code in the right way with right accessing of the right data fields please?

Further:

Now the case appears, that 'alerts' is not included as datafield (for example when no alert-data is transmitted because the weather is fine) - I althrought want to generate that JSON. I tried:

for item in data['alerts']:
    if 'alerts' not in data:
        continue
else:
    item['lang'] = detect(item['description'])

But if there is no 'alerts' datafield I got

      for item in data['alerts']:
KeyError: 'alerts'

How can I solve this? Is "continue" not the right task? Or have I to change if- and for-loop? Thx again!


Solution

  • Following works. Iterate over alert and add the key/value as you mentioned.

    import json
    from langdetect import detect
    
    with open("kiel.json", 'r') as f:
        data = json.loads(f.read())
    
    for item in data['alerts']:
        item['lang'] = detect(item['description']) 
    #'ADDED_KEY' = 'lang' - should be added as a data field to EVERY alert
    #'ADDED_VALUE' = 'en' or 'ger' - should be the detected language [via detect()] from data field 'description' of every alert 
    
    with open("kiel.json", 'w') as f:
        f.write(json.dumps(data, sort_keys=True, indent=4, separators=(',', ': ')))