Search code examples
pythonpandasserieskeyerrortry-except

Identify location of missing keys that produce KeyError


I have a list of about 58,000 rows, and each row is a dictionary.

Example:

my_list_of_dicts = 
[{'id': '555', 'lang': 'en'}, 
{'id': '444', 'lang': 'en'}, 
{'id': '333', 'lang': 'fr'},
{'id': '222', 'lang': 'es'}, 
{'id': '111', 'lang': 'ge'},
{'id': '666', 'lang': 'fr'}, 
{'id': '777', : 'du'}]

Inside each dictionary, you'll see that I have a key "lang' with a corresponding value, which is an abbreviations for several languages ('en', 'es', 'fr', 'du', 'ge', etc...)

I have successfully written the code I need to produce a series which contains a value_count of all of the unique values within this key.

When I do this, however, I get a KeyError because apparently there are a few dictionaries that do not contain the 'lang' value.

I created a try/except command that allows me to skip these missing values. It looks like there are aboout 5 rows out of 58,000 with a missing 'lang' key.

I want to find the location of these missing values for 'lang'. In other words, out of about 58,000 rows, how can I find which 5 rows have a missing 'lang' key?


Solution

  • You can use get and enumerate:

    my_list_of_dicts = 
    [{'id': '555', 'lang': 'en'}, 
     {'id': '444', 'lang': 'en'}, 
     {'id': '333', 'lang': 'fr'},
     {'id': '222', 'lang': 'es'}, 
     {'id': '111', 'lang': 'ge'},
     {'id': '666', 'lang': 'fr'}, 
     {'id': '777', "missing_lang": 'du'}]
     missing_vals = [i for i, a in enumerate(my_list_of_dicts) if not a.get("lang", False)]
    

    Bear in mind that the original dictionary you had contained : 'du' which is an invalid key-value pair, which would raise an error when you run your file. Therefore, I added a placeholder value for the purposes of demonstration.