I'm trying to covert an api response from json to a dataframe in pandas. the problem I am having is that de data is nested in the json format and I am not getting the right columns in my dataframe.
The data is collect from a api with the following format:
{
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
{
"timestamp": "2019-04-10T11:40:03.413Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.040000915527344
},
{
"comp": "humid",
"value": 34.630001068115234
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 169
},
{
"comp": "pm25",
"value": 5
},
{
"comp": "lux",
"value": 960.2000122070312
},
{
"comp": "spl_a",
"value": 46
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -1
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
What I already tried is the following: I have converted the JSON format into a dict and then I normalized it with the normalize function. see the code as following:
data = r.json()
works_data = json_normalize(data=data['data'], record_path=['sensors'],meta=['timestamp'])
df = pd.DataFrame.from_dict(works_data)
print(df)
the result I am getting is:
comp value timestamp
0 temp 21.059999 2019-04-10T12:39:05.062Z
1 humid 31.250000 2019-04-10T12:39:05.062Z
2 co2 407.000000 2019-04-10T12:39:05.062Z
3 voc 136.000000 2019-04-10T12:39:05.062Z
4 pm25 3.000000 2019-04-10T12:39:05.062Z
5 lux 1302.099976 2019-04-10T12:39:05.062Z
6 spl_a 46.299999 2019-04-10T12:39:05.062Z
the result I need is as following: the result
is there somebody who can help me?
You can reshape your works_data:
data = {
"data": [
{
"timestamp": "2019-04-10T11:40:13.437Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.010000228881836
},
{
"comp": "humid",
"value": 34.4900016784668
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 166
},
{
"comp": "pm25",
"value": 4
},
{
"comp": "lux",
"value": 961.4000244140625
},
{
"comp": "spl_a",
"value": 45.70000076293945
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -2
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
},
{
"timestamp": "2019-04-10T11:40:03.413Z",
"score": 87,
"sensors": [
{
"comp": "temp",
"value": 20.040000915527344
},
{
"comp": "humid",
"value": 34.630001068115234
},
{
"comp": "co2",
"value": 418
},
{
"comp": "voc",
"value": 169
},
{
"comp": "pm25",
"value": 5
},
{
"comp": "lux",
"value": 960.2000122070312
},
{
"comp": "spl_a",
"value": 46
}
],
"indices": [
{
"comp": "temp",
"value": -1
},
{
"comp": "humid",
"value": -1
},
{
"comp": "co2",
"value": 0
},
{
"comp": "voc",
"value": 0
},
{
"comp": "pm25",
"value": 0
}
]
}]}
from pandas.io.json import json_normalize
import pandas as pd
df = pd.DataFrame()
for each in data['data']:
timestamp = each['timestamp']
temp_df = json_normalize(data=each, record_path=['sensors']).T
columns = list(temp_df.iloc[0])
data_values = list(temp_df.iloc[1,:])
temp_df = pd.DataFrame([data_values + [timestamp]], columns=columns + ['timestamp'])
df = df.append(temp_df).reset_index(drop=True)
print(df)
Output:
print(df)
temp humid co2 ... lux spl_a timestamp
0 20.010000 34.490002 418.0 ... 961.400024 45.700001 2019-04-10T11:40:13.437Z
1 20.040001 34.630001 418.0 ... 960.200012 46.000000 2019-04-10T11:40:03.413Z
[2 rows x 8 columns]