I'd really appreciate your help. I am using json_normalize to access the nested list "terms" and extract the "objectid" and "medium" values in the data below.
with open('file_name.txt','r', encoding = 'utf8') as f:
data=json.loads(f.read())
df = pd.json_normalize(data, record_path = ['terms'],meta=['objectid','medium'])
print(df)
Instead of extracting all lists I only want to extract lists with term types "Classification" and "Materials".
"medium": "Handscroll; ink on paper",
"terms": [{
"term": "Chinese",
"aatid": 300018322,
"id": 2033355,
"termtype": "Culture"
},
{
"term": "scroll paintings",
"aatid": 300033666,
"id": 2053034,
"termtype": "Classification"
},
{
"term": "handscrolls",
"aatid": 300178463,
"id": 2053038,
"termtype": "Classification"
},
{
"term": "tigers",
"aatid": null,
"id": 2127399,
"termtype": "Subject"
},
{
"term": "paper (fiber product)",
"aatid": 300014109,
"id": 2043813,
"termtype": "Materials"
},
{
"term": "ink",
"aatid": 300015012,
"id": 2167672,
"termtype": "Materials"
}
],
"objectid": "138361",
I'd like to print:
term aatid id termtype objectid medium
handscrolls 300178463.0 2053038 Classification 138361 Handscroll; ink on paper
scroll paintings 300033666.0 2053034 Classification 138361 Handscroll; ink on paper
paper (fiber product) 300014109.0 2043813 Materials 138361 Handscroll; ink on paper
ink 300015012.0 2167672 Materials 138361 Handscroll; ink on paper
This should do the job, if everything else is ok. Your data isn't a valid json. First validate your json using https://jsonlint.com/
df.loc[(df.termtype == "Classification") | (df.termtype == "Materials")]