I'm trying to consume an api that has the following structure and load it into a pandas DF with one row per item ID e.g. 2, 6 and columns for the high, hightime, low and lowtime for each entry.
{
"data": {
"2": {
"high": 142,
"highTime": 1617214328,
"low": 140,
"lowTime": 1617214323
},
"6": {
"high": 182198,
"highTime": 1617214063,
"low": 182198,
"lowTime": 1617214137
So far I've been using json_normalise on the json response which loads one row with multiple nested columns for each entry:
data.2.high | data.2.highTime | data.2.low | data.2.lowTime etc
as I result, I tried adding the record_path for 'data' thinking that would address the fact that it's a nested list but doing so throws
raise TypeError(
257 f"{js} has non list value {result} for path {spec}. "
258 "Must be list or null."
I think that's because my res['data'] type is a dict, not a list in it's own right but I'm slightly confused how to go about solving that or if that's even right.
Any help would be greatly appreciated
Just use
df = pd.DataFrame.from_records(json_data['data']).T.reset_index()
In your scenario, pandas from_records
works better than json_normalise
. This the case because your response is structured in a way that the ids are the keys and not the values.
For instance, for this response example where there is a key id
and its correspondent value
json_data={
"data": [{
"id":2,
"high": 142,
"highTime": 1617214328,
"low": 140,
"lowTime": 1617214323
},{
"id":6,
"high": 182198,
"highTime": 1617214063,
"low": 182198,
"lowTime": 1617214137
}]}
would work fine with json_normalize
, as follows.
pd.json_normalize(json_data['data'])
id high highTime low lowTime
2 142 1617214328 140 1617214323
6 182198 1617214063 182198 1617214137
However, your JSON response contains ids as keys,
json_data={
"data": {
"2": {
"high": 142,
"highTime": 1617214328,
"low": 140,
"lowTime": 1617214323
},
"6": {
"high": 182198,
"highTime": 1617214063,
"low": 182198,
"lowTime": 1617214137
}}}
and so from_records
works better.
df = pd.DataFrame.from_records(json_data['data']).T.reset_index()
index high highTime low lowTime
2 142 1617214328 140 1617214323
6 182198 1617214063 182198 1617214137
Also, things were not working because you probably were passing the full json response json_data
instead of selecting by the data key json_data['data']
.