Search code examples
jsonpython-3.xpandasnested-listsjson-normalize

How do I extract lists identified by value from nested list in json object?


I'd really appreciate your help. I am using json_normalize to access the nested list "terms" and extract the "objectid" and "medium" values in the data below.

with open('file_name.txt','r', encoding = 'utf8') as f:
data=json.loads(f.read())

df = pd.json_normalize(data, record_path = ['terms'],meta=['objectid','medium']) 
print(df)

Instead of extracting all lists I only want to extract lists with term types "Classification" and "Materials".

"medium": "Handscroll; ink on paper",      
        "terms": [{
            "term": "Chinese",
            "aatid": 300018322,
            "id": 2033355,
            "termtype": "Culture"
        },
        {
            "term": "scroll paintings",
            "aatid": 300033666,
            "id": 2053034,
            "termtype": "Classification"
        },
        {
            "term": "handscrolls",
            "aatid": 300178463,
            "id": 2053038,
            "termtype": "Classification"
        },
        {
            "term": "tigers",
            "aatid": null,
            "id": 2127399,
            "termtype": "Subject"
        },
        {
            "term": "paper (fiber product)",
            "aatid": 300014109,
            "id": 2043813,
            "termtype": "Materials"
        },
        {
            "term": "ink",
            "aatid": 300015012,
            "id": 2167672,
            "termtype": "Materials"
        }
    ],
     "objectid": "138361",

I'd like to print:

                 term        aatid       id        termtype objectid                    medium
          handscrolls  300178463.0  2053038  Classification   138361  Handscroll; ink on paper
     scroll paintings  300033666.0  2053034  Classification   138361  Handscroll; ink on paper
paper (fiber product)  300014109.0  2043813       Materials   138361  Handscroll; ink on paper
                  ink  300015012.0  2167672       Materials   138361  Handscroll; ink on paper

              

Solution

  • This should do the job, if everything else is ok. Your data isn't a valid json. First validate your json using https://jsonlint.com/

    df.loc[(df.termtype == "Classification") | (df.termtype == "Materials")]