Search code examples
pythonjsonpandasjson-normalize

JSON Normalize with Value as Column


I have the following JSON;

{
  "data": [
    {
      "gid": "1203715497540179",
      "completed": false,
      "custom_fields": [
        {
          "gid": "1203887422469746",
          "enabled": true,
          "name": "Inputs",
          "description": "",
          "display_value": null,
          "resource_subtype": "text",
          "resource_type": "custom_field",
          "text_value": null,
          "type": "text"
        },
        {
          "gid": "1126427465960522",
          "enabled": false,
          "name": "T-Minus",
          "description": "",
          "display_value": "54",
          "resource_subtype": "text",
          "resource_type": "custom_field",
          "text_value": "54",
          "type": "text"
        }
      ],
      "due_on": "2023-01-25",
      "name": "General Information"
    }
  ]
}

And I want to build the following pandas dataframe with it. Basically I want to grab name from custom_fields and make it a column whose value is display_value

name                 due_on         Inputs   T-Minus
General Information  2023-01-25      null      54

I don't think this can be done with just normalizing. So I started with:

df = pd.json_normalize(test, 
                       record_path =['custom_fields'],
                       record_prefix='_',
                       errors='ignore',
                       meta=['name', 'due_on'])

This gets me to something like this:

_name _display_value name due_on .....(extra fields that I do not need)
Inputs  null         General Information
T-Minus  54          General Information

How can I go now from this dataframe to the one I want?


Solution

  • Use pivot just after pd.json_normalize:

    df = pd.json_normalize(test,  # or test['data']? 
                           record_path =['custom_fields'],
                           record_prefix='_',
                           errors='ignore',
                           meta=['name', 'due_on'])
    
    df = (df.pivot(index=['name', 'due_on'], columns='_name', values='_display_value')
            .reset_index().rename_axis(columns=None))
    

    Output:

    >>> df
                      name      due_on Inputs T-Minus
    0  General Information  2023-01-25   None      54