I have a json with nested objects (nested list of objects): json:
{
"uniqueIdentifier": {
"identity": {
"textIdentifier": "MysticFoliage",
"encodedIdentifier": "<p>MysticFoliage</p>"
}
},
"languagePreference": {
"chosenLanguage": {
"displayText": "Enigmatic Tongue",
"languageCode": "<p>Enigmatic Tongue</p>"
}
},
"specialCode": 42,
"creationTimestamp": "1630411200000",
"creatorAlias": "whisperingShadow",
"unusualStatus": "UNEXPLORED",
"identifierCode": "XyzAbc123",
"nativeTongue": [],
"modificationTimestamp": "1670160000000",
"modifierAlias": "arcaneTraveler",
"designatedNameWithComma": {
"designation": {
"displayText": "Foliage, Mystic",
"encodedText": "<p>Foliage, Mystic</p>"
}
},
"otherRecognizedUniqueIdentifiers": [],
"otherKnownLanguages": [],
"otherKnownCodes": [],
"territories": [
{
"identityCode": "DefGhi456",
"designationEn": "Enchanted Forest",
"designationLanguage": "Enchanted Forest"
}
],
"scientificDesignation": {
"generatedScientificDesignation": {
"designation": {
"displayText": "Mysticus plantae enigma",
"encodedText": "<em>Mysticus</em> <em>plantae</em> enigma"
},
"status": "MYSTERIOUS"
},
"genusInfo": {
"code": "ZwxYvu789",
"designation": "Mysticus"
},
"speciesInfo": {
"code": "RstUvw123",
"designation": "plantae"
},
"subSpeciesInfo": {
"code": "KlmNop456",
"designation": "enigma"
},
"subSpeciesPrefixes": [
"sub."
],
"varietyPrefixes": [
"var."
]
},
"statusState": "UNKNOWN",
"currentCondition": "ETERNAL",
"classificationGroup": {
"groupCode": "PqrStu789",
"designation": "Enigma Kingdom",
"designationLanguage": "Enigmatic Realm"
}
}
I used pd.json_normalize(json)
on it but some of the fields are still nested, such as "territories":
territories |
---|
[{'identityCode': 'DefGhi456', 'designationEn': 'Enchanted Forest', 'designationLanguage': 'Enchanted Forest'}] |
what I want is:
territories.identityCode | territories.designationEn | territories.designationLanguage |
---|---|---|
'DefGhi456' | 'Enchanted Forest' | 'Enchanted Forest' |
I tried pd.json_normalize(json, "territories")
but this only gives the output for "territories" and not the rest of the json. (it does normalize 'territories' it correctly).
From this Nested list after json_normalize answer it says to do e.g:
metadata = ['name', 'period', 'title', 'description', 'id']
out = pd.json_normalize(data_read['data'], 'values', metadata)
But I have a lot of column titles and several other jsons will more fields, it would require me to create many lists manually. I tried just getting the column names using df.columns by doing:
fileReader = json.loads(data)
df = pd.DataFrame(fileReader)
metadata = list(df.columns)
j2 = pd.json_normalize(fileReader, "ranges", metadata)
But I get error:
KeyError: "Key 'note' not found. To replace missing values of 'note' with np.nan, pass in errors='ignore'"
Trying with errors='ignore'
I get:
ValueError: Conflicting metadata name id, need distinguishing prefix
Which seems to mean I'd still need to manually set the column names for some of the "id" columns? Which is what I want to avoid.
Try:
import json
with open("data.json", "r") as f_in:
data = json.load(f_in)
df = pd.json_normalize(data).explode("territories")
df = pd.concat(
[df, df.pop("territories").apply(pd.Series).add_prefix("territories.")], axis=1
)
print(df)
Prints:
specialCode creationTimestamp creatorAlias unusualStatus identifierCode nativeTongue modificationTimestamp modifierAlias otherRecognizedUniqueIdentifiers otherKnownLanguages otherKnownCodes statusState currentCondition uniqueIdentifier.identity.textIdentifier uniqueIdentifier.identity.encodedIdentifier languagePreference.chosenLanguage.displayText languagePreference.chosenLanguage.languageCode designatedNameWithComma.designation.displayText designatedNameWithComma.designation.encodedText scientificDesignation.generatedScientificDesignation.designation.displayText scientificDesignation.generatedScientificDesignation.designation.encodedText scientificDesignation.generatedScientificDesignation.status scientificDesignation.genusInfo.code scientificDesignation.genusInfo.designation scientificDesignation.speciesInfo.code scientificDesignation.speciesInfo.designation scientificDesignation.subSpeciesInfo.code scientificDesignation.subSpeciesInfo.designation scientificDesignation.subSpeciesPrefixes scientificDesignation.varietyPrefixes classificationGroup.groupCode classificationGroup.designation classificationGroup.designationLanguage territories.identityCode territories.designationEn territories.designationLanguage
0 42 1630411200000 whisperingShadow UNEXPLORED XyzAbc123 [] 1670160000000 arcaneTraveler [] [] [] UNKNOWN ETERNAL MysticFoliage <p>MysticFoliage</p> Enigmatic Tongue <p>Enigmatic Tongue</p> Foliage, Mystic <p>Foliage, Mystic</p> Mysticus plantae enigma <em>Mysticus</em> <em>plantae</em> enigma MYSTERIOUS ZwxYvu789 Mysticus RstUvw123 plantae KlmNop456 enigma [sub.] [var.] PqrStu789 Enigma Kingdom Enigmatic Realm DefGhi456 Enchanted Forest Enchanted Forest