json python-3.x pandas dictionary jmespath

How to crawl JSON dictionary object into pandas dataframe regardless if value missing in dictionary?

I am making a API call and getting my JSON data like so:

import requests 
import jmespath
import pandas as pd
import json

url = 'a.com'

r = requests.get(url).json()

The object returned looks like this:

{'question': [{
   'response': {'firstname': {'value': 'John'},
    'lastname': {'value': 'Bob'}},
   'profile_question': [{
     'identities': [{'type': 'ID,
       'value': '1'},
      {'type': 'EMAIL',
       'value': 'test@test.com'}]}]}]}

I tried putting this into json.fr but I get error that it is not correctly formed json. How every I can crawl this object as is but not successfully for what I need.

I am trying to use jmespath library to crawl and want to pull out four pieces of information firstname, lastname, ID, EMAIL like so and appending the data into list:

    lst =[]

    fname = jmespath.search('question[*].response.{firstname:firstname.value}',my_dict)
    lst.append(fname)

    lname = jmespath.search('question[*].response.{lastname:lastname.value}',my_dict)
    lst.append(lname)

    email_path = jmespath.search("question[*].profile_question[].identities.{email:[?type=='EMAIL'].value}",my_dict)
    lst.append(email)

    ID = jmespath.search("question[*].profile_question[].identities.{email:[?type=='ID'].value}",my_dict)
    lst.append(ID)

I append into a list in hopes of creating tuples per iteration that I can push into a dataframe.

The list looks like this:

[[{'firstname': 'John'}],
 [{'lastname': 'Bob'}],
 [{'email': ['test@test.com']}],
 [{'ID': ['1']}]]

However when I crawl the dictionary with missing values like so:

{'question': [{
   'response': {'firstname': {'value': 'John'},
    'lastname': {'value': 'Bob'}},
   'profile-question': [{
     'identities': [{'type': 'ID,
       'value': '1'},
      {'type': 'EMAIL',
       'value': 'test@test.com'}]}]}],
   'response': {'firstname': {'value': 'John1'},
    'lastname': {'value': 'Bob1'}},
   'profile-question': [{
     'identities': [{'type': 'ID,
       'value': '2'}]}]}

causes my list to behave like this (I can not tell why):

[[{'firstname': 'John'}], [{'email': ['test@test.com']}], [{'email': ['1']},[{'firstname': 'John'}],
 [{'lastname': 'Bob'}],
 [{'email': [][][]}],
 [{'ID': ['1']}]]]

which causes the df to look like this:

firstname         lastname      email                ID
john                 bob        test@test.com        1 
john1                bob1       test@test.com        1

How do I crawl a JSON dict object as it comes in from the API, pulling out four pieces of data firstname, lastname, email, ID and appending into a dataframe like so? :

firstname         lastname      email                    ID
    john                 bob        test@test.com        1 
    john1                bob1                            2

More than willing to get away from jmespath library, and to add, the above dictionary has many more fields, I have shortened so only the key points and their indentation is listed.

Solution

Well before anything the reason for the error is because the json object is missing quotes after ID.

 {'question': [{
   'response': {'firstname': {'value': 'John'},
    'lastname': {'value': 'Bob'}},
   'profile_question': [{
     'identities': [{'type': 'ID,
       'value': '1'},
      {'type': 'EMAIL',
       'value': 'test@test.com'}]}]}]}

It should look like this:

{'question': [{
   'response': {'firstname': {'value': 'John'},
    'lastname': {'value': 'Bob'}},
   'profile_question': [{
     'identities': [{'type': 'ID',
       'value': '1'},
      {'type': 'EMAIL',
       'value': 'test@test.com'}]}]}]}

From here you can use the json library to turn the json object into a python dictionary object with json.loads(). Once you fixed the json object, your code can look something like this.

import jmespath as jp
import pandas as pd


jon = {'question':
              [{'response': {'firstname': {'value': 'John'},
                             'lastname': {'value': 'Bob'}},
                'profile_question': [{'identities': [{'type': 'ID',
                                                      'value': '1'},
                                                     {'type': 'EMAIL', 'value': 'test@test.com'}]}]}]}


jsons = [jon] # list of all json objects
df_list = []
for json in jsons:
    try:
        fname = jp.search('question[*].response.firstname.value', jon)[0]
    except IndexError:
        fname = None
    try:
        lname = jp.search('question[*].response.lastname.value', jon)[0]
    except IndexError:
        lname = None
    try:
        email = jp.search("question[*].profile_question[].identities.{email:[?type=='EMAIL'].value}", jon)[0]['email'][0]
    except IndexError:
        email = None
    try:
        user_id = jp.search("question[*].profile_question[].identities.{email:[?type=='ID'].value}", jon)[0]['email'][0]
    except IndexError:
        user_id = None
    df_list.append(pd.DataFrame({'firstname': fname, 'lastname': lname, 'email': email, 'id': user_id}, index=[0]))


df = pd.concat(df_list, ignore_index=True, sort=False)
print(df)