Search code examples
pythonjsonpandasdataframenormalize

Normalize nested json with varying keys


I am getting very nested json for different items through an API and am then trying to convert some of the received information into a dataframe.

So far, I have worked with this line to get the dataframe I want:

df = pd.json_normalize(result, meta=['properties'], record_path=['view', 'item', 'fields', 'field'])

This works sometimes, but other times I either get a KeyError for the meta-keys or for the record-path itself (KeyError: 'view'). I assume that this is because the json I receive is not always exactly the same but can vary according to the type of item that information about is requested.

My question now is, if there is a way to skip data which doesn't have any of these keys? Or if there are other options to ignore the data that doesn't have those keys in it?

Alternatively, is there a way to find out for which items the keys differ and to treat those differently? Since we are talking of dumps with partially 800-1000 items in them with long json, I don't think that I can actually go through them manually to find the ones that don't fit...

Any help much appreciated!


Solution

  • The quick and dirty solution is to use errors='ignore' option:

    df = pd.json_normalize(result, meta=['properties'], record_path=['view', 'item', 'fields', 'field'], errors='ignore')
    

    If you also want to extract the entries that have the different structure (the ones that caused the errors), then the best thing would be to filter them out from the result object and then run pd.json_normalize again specifying the appropriate record_path.