I am making a API call and getting my JSON data like so:
import requests
import jmespath
import pandas as pd
import json
url = 'a.com'
r = requests.get(url).json()
The object returned looks like this:
{'question': [{
'response': {'firstname': {'value': 'John'},
'lastname': {'value': 'Bob'}},
'profile_question': [{
'identities': [{'type': 'ID,
'value': '1'},
{'type': 'EMAIL',
'value': 'test@test.com'}]}]}]}
I tried putting this into json.fr but I get error that it is not correctly formed json. How every I can crawl this object as is but not successfully for what I need.
I am trying to use jmespath
library to crawl and want to pull out four pieces of information firstname, lastname, ID, EMAIL
like so and appending the data into list:
lst =[]
fname = jmespath.search('question[*].response.{firstname:firstname.value}',my_dict)
lst.append(fname)
lname = jmespath.search('question[*].response.{lastname:lastname.value}',my_dict)
lst.append(lname)
email_path = jmespath.search("question[*].profile_question[].identities.{email:[?type=='EMAIL'].value}",my_dict)
lst.append(email)
ID = jmespath.search("question[*].profile_question[].identities.{email:[?type=='ID'].value}",my_dict)
lst.append(ID)
I append into a list in hopes of creating tuples per iteration that I can push into a dataframe.
The list looks like this:
[[{'firstname': 'John'}],
[{'lastname': 'Bob'}],
[{'email': ['test@test.com']}],
[{'ID': ['1']}]]
However when I crawl the dictionary with missing values like so:
{'question': [{
'response': {'firstname': {'value': 'John'},
'lastname': {'value': 'Bob'}},
'profile-question': [{
'identities': [{'type': 'ID,
'value': '1'},
{'type': 'EMAIL',
'value': 'test@test.com'}]}]}],
'response': {'firstname': {'value': 'John1'},
'lastname': {'value': 'Bob1'}},
'profile-question': [{
'identities': [{'type': 'ID,
'value': '2'}]}]}
causes my list to behave like this (I can not tell why):
[[{'firstname': 'John'}], [{'email': ['test@test.com']}], [{'email': ['1']},[{'firstname': 'John'}],
[{'lastname': 'Bob'}],
[{'email': [][][]}],
[{'ID': ['1']}]]]
which causes the df to look like this:
firstname lastname email ID
john bob test@test.com 1
john1 bob1 test@test.com 1
How do I crawl a JSON dict object as it comes in from the API, pulling out four pieces of data firstname, lastname, email, ID
and appending into a dataframe like so? :
firstname lastname email ID
john bob test@test.com 1
john1 bob1 2
More than willing to get away from jmespath
library, and to add, the above dictionary has many more fields, I have shortened so only the key points and their indentation is listed.
Well before anything the reason for the error is because the json object is missing quotes after ID.
{'question': [{
'response': {'firstname': {'value': 'John'},
'lastname': {'value': 'Bob'}},
'profile_question': [{
'identities': [{'type': 'ID,
'value': '1'},
{'type': 'EMAIL',
'value': 'test@test.com'}]}]}]}
It should look like this:
{'question': [{
'response': {'firstname': {'value': 'John'},
'lastname': {'value': 'Bob'}},
'profile_question': [{
'identities': [{'type': 'ID',
'value': '1'},
{'type': 'EMAIL',
'value': 'test@test.com'}]}]}]}
From here you can use the json library to turn the json object into a python dictionary object with json.loads()
. Once you fixed the json object, your code can look something like this.
import jmespath as jp
import pandas as pd
jon = {'question':
[{'response': {'firstname': {'value': 'John'},
'lastname': {'value': 'Bob'}},
'profile_question': [{'identities': [{'type': 'ID',
'value': '1'},
{'type': 'EMAIL', 'value': 'test@test.com'}]}]}]}
jsons = [jon] # list of all json objects
df_list = []
for json in jsons:
try:
fname = jp.search('question[*].response.firstname.value', jon)[0]
except IndexError:
fname = None
try:
lname = jp.search('question[*].response.lastname.value', jon)[0]
except IndexError:
lname = None
try:
email = jp.search("question[*].profile_question[].identities.{email:[?type=='EMAIL'].value}", jon)[0]['email'][0]
except IndexError:
email = None
try:
user_id = jp.search("question[*].profile_question[].identities.{email:[?type=='ID'].value}", jon)[0]['email'][0]
except IndexError:
user_id = None
df_list.append(pd.DataFrame({'firstname': fname, 'lastname': lname, 'email': email, 'id': user_id}, index=[0]))
df = pd.concat(df_list, ignore_index=True, sort=False)
print(df)