Search code examples
pythonjsonpandasjson-normalize

How to normalize nested fields with json_normalizer()?


I have a json with nested objects (nested list of objects): json:

{
    "uniqueIdentifier": {
        "identity": {
            "textIdentifier": "MysticFoliage",
            "encodedIdentifier": "<p>MysticFoliage</p>"
        }
    },
    "languagePreference": {
        "chosenLanguage": {
            "displayText": "Enigmatic Tongue",
            "languageCode": "<p>Enigmatic Tongue</p>"
        }
    },
    "specialCode": 42,
    "creationTimestamp": "1630411200000",
    "creatorAlias": "whisperingShadow",
    "unusualStatus": "UNEXPLORED",
    "identifierCode": "XyzAbc123",
    "nativeTongue": [],
    "modificationTimestamp": "1670160000000",
    "modifierAlias": "arcaneTraveler",
    "designatedNameWithComma": {
        "designation": {
            "displayText": "Foliage, Mystic",
            "encodedText": "<p>Foliage, Mystic</p>"
        }
    },
    "otherRecognizedUniqueIdentifiers": [],
    "otherKnownLanguages": [],
    "otherKnownCodes": [],
    "territories": [
        {
            "identityCode": "DefGhi456",
            "designationEn": "Enchanted Forest",
            "designationLanguage": "Enchanted Forest"
        }
    ],
    "scientificDesignation": {
        "generatedScientificDesignation": {
            "designation": {
                "displayText": "Mysticus plantae enigma",
                "encodedText": "<em>Mysticus</em> <em>plantae</em> enigma"
            },
            "status": "MYSTERIOUS"
        },
        "genusInfo": {
            "code": "ZwxYvu789",
            "designation": "Mysticus"
        },
        "speciesInfo": {
            "code": "RstUvw123",
            "designation": "plantae"
        },
        "subSpeciesInfo": {
            "code": "KlmNop456",
            "designation": "enigma"
        },
        "subSpeciesPrefixes": [
            "sub."
        ],
        "varietyPrefixes": [
            "var."
        ]
    },
    "statusState": "UNKNOWN",
    "currentCondition": "ETERNAL",
    "classificationGroup": {
        "groupCode": "PqrStu789",
        "designation": "Enigma Kingdom",
        "designationLanguage": "Enigmatic Realm"
    }
}

I used pd.json_normalize(json) on it but some of the fields are still nested, such as "territories":

territories
[{'identityCode': 'DefGhi456', 'designationEn': 'Enchanted Forest', 'designationLanguage': 'Enchanted Forest'}]

what I want is:

territories.identityCode territories.designationEn territories.designationLanguage
'DefGhi456' 'Enchanted Forest' 'Enchanted Forest'

I tried pd.json_normalize(json, "territories") but this only gives the output for "territories" and not the rest of the json. (it does normalize 'territories' it correctly).

From this Nested list after json_normalize answer it says to do e.g:

metadata = ['name', 'period', 'title', 'description', 'id']
out = pd.json_normalize(data_read['data'], 'values', metadata)

But I have a lot of column titles and several other jsons will more fields, it would require me to create many lists manually. I tried just getting the column names using df.columns by doing:

fileReader = json.loads(data)
df = pd.DataFrame(fileReader)
metadata = list(df.columns)

j2 = pd.json_normalize(fileReader, "ranges", metadata)

But I get error:

KeyError: "Key 'note' not found. To replace missing values of 'note' with np.nan, pass in errors='ignore'"

Trying with errors='ignore' I get:

ValueError: Conflicting metadata name id, need distinguishing prefix 

Which seems to mean I'd still need to manually set the column names for some of the "id" columns? Which is what I want to avoid.


Solution

  • Try:

    import json
    
    with open("data.json", "r") as f_in:
        data = json.load(f_in)
    
    df = pd.json_normalize(data).explode("territories")
    df = pd.concat(
        [df, df.pop("territories").apply(pd.Series).add_prefix("territories.")], axis=1
    )
    print(df)
    

    Prints:

       specialCode creationTimestamp      creatorAlias unusualStatus identifierCode nativeTongue modificationTimestamp   modifierAlias otherRecognizedUniqueIdentifiers otherKnownLanguages otherKnownCodes statusState currentCondition uniqueIdentifier.identity.textIdentifier uniqueIdentifier.identity.encodedIdentifier languagePreference.chosenLanguage.displayText languagePreference.chosenLanguage.languageCode designatedNameWithComma.designation.displayText designatedNameWithComma.designation.encodedText scientificDesignation.generatedScientificDesignation.designation.displayText scientificDesignation.generatedScientificDesignation.designation.encodedText scientificDesignation.generatedScientificDesignation.status scientificDesignation.genusInfo.code scientificDesignation.genusInfo.designation scientificDesignation.speciesInfo.code scientificDesignation.speciesInfo.designation scientificDesignation.subSpeciesInfo.code scientificDesignation.subSpeciesInfo.designation scientificDesignation.subSpeciesPrefixes scientificDesignation.varietyPrefixes classificationGroup.groupCode classificationGroup.designation classificationGroup.designationLanguage territories.identityCode territories.designationEn territories.designationLanguage
    0           42     1630411200000  whisperingShadow    UNEXPLORED      XyzAbc123           []         1670160000000  arcaneTraveler                               []                  []              []     UNKNOWN          ETERNAL                            MysticFoliage                        <p>MysticFoliage</p>                              Enigmatic Tongue                        <p>Enigmatic Tongue</p>                                 Foliage, Mystic                          <p>Foliage, Mystic</p>                                                      Mysticus plantae enigma                                    <em>Mysticus</em> <em>plantae</em> enigma                                                  MYSTERIOUS                            ZwxYvu789                                    Mysticus                              RstUvw123                                       plantae                                 KlmNop456                                           enigma                                   [sub.]                                [var.]                     PqrStu789                  Enigma Kingdom                         Enigmatic Realm                DefGhi456          Enchanted Forest                Enchanted Forest