Search code examples
pythonjsonpandasnormalize

Parse deeply nested JSON file


I'm struggling to get the information I need with json_normalize. I've looked at the documentation and probably 10 examples of deeply nested JSON files, but I can't quite grasp the context of the function well enough to extract the right info. I'm trying to build a data frame that would contain the timestamped values (values key) for each sensor. 1534023900 is the timestamp in UTC Seconds.

enter image description here

A short sample of the JSON is below.

Any thoughts?

{
    "created": "2020-05-12T15:10:37Z",
    "device": {
        "device_info": {
            "device_fw": 204,
            "device_sn": "06-02133",
            "device_trait": 2,
            "device_type": 190
        },
        "timeseries": [
            {
                "configuration": {
                    "sensors": [
                        {
                            "measurements": [
                                "BATTERY",
                                "BATTERY_MV"
                            ],
                            "port": 7,
                            "sensor_bonus_value": "Unavailable",
                            "sensor_firmware_ver": "Unavailable",
                            "sensor_number": 133,
                            "sensor_sn": "Unavailable"
                        },
                        {
                            "measurements": [
                                "REFERENCE_KPA",
                                "TEMPC_LOGGER"
                            ],
                            "port": 8,
                            "sensor_bonus_value": "Unavailable",
                            "sensor_firmware_ver": "Unavailable",
                            "sensor_number": 134,
                            "sensor_sn": "Unavailable"
                        }
                    ],
                    "valid_since": "2018-08-11T21:45:00Z",
                    "values": [
                        [
                            1534023900,
                            0,
                            19,
                            [
                                {
                                    "description": "Battery Percent",
                                    "error": false,
                                    "units": "%",
                                    "value": 100
                                },
                                {
                                    "description": "Battery Voltage",
                                    "error": false,
                                    "units": " mV",
                                    "value": 7864
                                }
                            ],
                            [
                                {
                                    "description": "Reference Pressure",
                                    "error": false,
                                    "units": " kPa",
                                    "value": 100.62
                                },
                                {
                                    "description": "Logger Temperature",
                                    "error": false,
                                    "units": " \u00b0C",
                                    "value": 28.34
                                }
                            ]
                        ]
					}
				}
			}
		}
	}	
}


Solution

  • jmespath can help with nested data : the docs are quite robust, but the basics for accessing data are : if it is a key, then you can use a . if it is not the first entry in the data, if it is an array/list use the []

    Summary of your data position : device -> timeseries(dict)->[](array)->configuration(dict)->values(key)->[](array)->[0](array and get the first value)

    Actual code:

    import jmespath
    expression = jmespath.compile('device.timeseries[].configuration.values[][0]')
    expression.search(data)
    [1534023900]