Search code examples
pandastimestampjson-normalize

Parsing JSON with number as key usng pandas


My question is similar to this one - Parsing JSON with number as key and this one Parsing nested JSON except that I want to parse my JSON file with pandas normalize. Eventually I want to write this dataframe into an excel file( and I have the code to do that) . The dataframe I want in memory should look like this

Timestamp  BatteryVoltage GridCurrent GridVoltage InverterCurrent InverterVoltage 

....

....

The current code that I have does not help me at all -

import json
import datetime
import pandas as pd
from pandas.io.json import json_normalize

with open('test.json') as data_file:
     data = json.load(data_file)


  

df = pd.json_normalize(data['timestamp'])

I know I have to give something as argument to json_normalize record_path but I am not sure what it is going to be as the value of timestamp keeps changing.

{"timestamp": {
"1636987025": {
  "batteryVoltage": 28.74732,
  "gridCurrent": 3.68084,
  "gridVoltage": 230.64401,
  "inverterCurrent": 2.00471,
  "inverterVoltage": 224.18573,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987008
},
"1636987085": {
  "batteryVoltage": 28.52959,
  "gridCurrent": 3.40046,
  "gridVoltage": 230.41367,
  "inverterCurrent": 1.76206,
  "inverterVoltage": 225.24319,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987136
},
"1636987146": {
  "batteryVoltage": 28.5338,
  "gridCurrent": 3.37573,
  "gridVoltage": 229.27209,
  "inverterCurrent": 2.11128,
  "inverterVoltage": 225.51733,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987136
},
"1636987206": {
  "batteryVoltage": 28.55535,
  "gridCurrent": 3.43365,
  "gridVoltage": 229.47604,
  "inverterCurrent": 1.98594,
  "inverterVoltage": 225.83649,
  "solarCurrent": 0,
  "solarVoltage": 0,
  "tValue": 1636987264
  }
 }
}

Solution

  • pandas' json_normalize is designed for json/dicts that lists embedded in them; your data does not have that.

    One option is with jmespath:

    # pip install jmespath
    
    expression = jmespath.compile("timestamp.*")
    pd.DataFrame(expression.search(data)) 
       batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
    0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
    1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
    2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
    3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264
    

    have a look at the docs for jmespath; it is a wonderful tool that can be handy in wrangling json.

    Another option, without jmespath and just pure python:

    result = [value for _, value in data['timestamp'].items()]
    pd.DataFrame(result)
       batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
    0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
    1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
    2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
    3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264
    

    To capture the timestamp is easy:

    # for python 3.9, you can use | instead for the dictionary merging
    result = [{**value, **{'timestamp':key}} for key, value in data['timestamp'].items()]
    
    pd.DataFrame(result)
       batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue   timestamp
    0        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008  1636987025
    1        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136  1636987085
    2        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136  1636987146
    3        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264  1636987206
    

    Another option:

    pd.DataFrame.from_dict(data['timestamp'], orient='index').rename_axis('timestamp').reset_index()
        timestamp  batteryVoltage  gridCurrent  gridVoltage  inverterCurrent  inverterVoltage  solarCurrent  solarVoltage      tValue
    0  1636987025        28.74732      3.68084    230.64401          2.00471        224.18573             0             0  1636987008
    1  1636987085        28.52959      3.40046    230.41367          1.76206        225.24319             0             0  1636987136
    2  1636987146        28.53380      3.37573    229.27209          2.11128        225.51733             0             0  1636987136
    3  1636987206        28.55535      3.43365    229.47604          1.98594        225.83649             0             0  1636987264