Search code examples
pythonjsonpandasdataframedata-analysis

Python - Manage nested JSON to DataFrame


I'm at the beginning of my data scientist journey and I'm struggling with JSON and Python. Even if I know the basics of the DataFrame manipulation and JSON format manipulation, I've just got a JSON file with this data:

{"Orders":
    [
        {
            "OrderID":"1000004209",
            "Email":"[email protected]",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-02 14:05:16",
            "OrderStatus":"wc-failed",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Eau de Parfum Zafferano",
                    "Sku":"44160",
                    "Quantity":"1",
                    "ItemCost":"27.00",
                    "ItemTotal":"27",
                    "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                }
            }
        },
        {
            "OrderID":"1000004210",
            "Email":"[email protected]",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"GE",
            "OrderDate":"2019-05-02 14:17:32",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"9.00",
            "TotalDiscount":"0",
            "OrderSubTotal":"9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Sapone Marsiglia 200 g",
                    "Sku":"01026",
                    "Quantity":"1",
                    "ItemCost":"4.10",
                    "ItemTotal":"4.1",
                    "Category":"MARSEILLE;SAPONETTE"
                }
            }
        },
        {
            "OrderID":"1000004211",
            "Email":"[email protected]",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"GE",
            "OrderDate":"2019-05-02 14:21:42",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": {
                    "ProductName":"Eau de Parfum Zafferano",
                    "Sku":"44160",
                    "Quantity":"1",
                    "ItemCost":"27.00",
                    "ItemTotal":"27",
                    "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                }
            }
        },
        {
            "OrderID":"1000004235",
            "Email":"[email protected]",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-03 09:37:06",
            "OrderStatus":"wc-cancelled",
            "OrderTotal":"31.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"31.9",
            "Coupon":"",
            "OrderItems": {
                "Item": [
                    {
                        "ProductName":"Eau de Parfum Zafferano",
                        "Sku":"44160",
                        "Quantity":"1",
                        "ItemCost":"27.00",
                        "ItemTotal":"27",
                        "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                    },
                    {
                        "ProductName":"Sapone Vegetale Lavanda Officinalis Bio",
                        "Sku":"01049",
                        "Quantity":"1",
                        "ItemCost":"4.90",
                        "ItemTotal":"4.9",
                        "Category":"ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)"
                    }
                ]
            }
        },
        {
            "OrderID":"1000004292",
            "Email":"[email protected]",
            "AnnoNascita":"-",
            "Age":"-",
            "Gender":"-",
            "Provincia":"CR",
            "OrderDate":"2019-05-06 08:52:47",
            "OrderStatus":"wc-failed",
            "OrderTotal":"64.90",
            "TotalDiscount":"0",
            "OrderSubTotal":"64.9",
            "Coupon":"",
            "OrderItems": {
                "Item": [
                    {
                        "ProductName":"Schiuma da Barba Pour Homme",
                        "Sku":"45396",
                        "Quantity":"2",
                        "ItemCost":"12.00",
                        "ItemTotal":"24",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Detergente Intimo Delicato Mamma",
                        "Sku":"38420",
                        "Quantity":"1",
                        "ItemCost":"11.00",
                        "ItemTotal":"11",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Spray per Ambiente - Preziosa",
                        "Sku":"44231",
                        "Quantity":"2",
                        "ItemCost":"10.00",
                        "ItemTotal":"20",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    },
                    {
                        "ProductName":"Cuscinetti Profumati - Preziosa",
                        "Sku":"45491",
                        "Quantity":"1",
                        "ItemCost":"9.90",
                        "ItemTotal":"9.9",
                        "Category":"POUR HOMME;LINEE UOMO;RASATURA"
                    }
                ]
            }
        },
    ]
}

What I'm trying to do is to build a DataFrame with Pandas in order to manipulate data and gather informations.

At first, I've tried to use the pd.read_json('path_to_file') function from Pandas, but I got this result:

                                            Orders
0   {'OrderID': '1000004209', 'Email': 'name@ma...
1   {'OrderID': '1000004210', 'Email': 'name@ma...
2   {'OrderID': '1000004211', 'Email': 'name@ma...
3   {'OrderID': '1000004235', 'Email': 'name@ma...
4   {'OrderID': '1000004292', 'Email': 'name@ma...

I've tried to get a DatFrame from each row using pd.DataFrame(df['Orders']), but it returns the same DataFrame. I've tried to use a for loop to append single rows in a new DataFrame but I got to a dead-end road here too.

I got quite lost looking into all the topics related to this here on StackOverflow without finding a solution to my problem.

Actually, what I need is to create a DataFrame with a column for every main value (like "OrderID", "Email", "AnnoNascita" and so on) and a column named "OrderItems" with the array of all value in "Item". I was thinking about something like:

        OrderID             Email   AnnoNascita   Age     Gender     Provincia               OrderDate     OrderStatus     OrderTotal     Coupon     OrderItems
0    1000004209     [email protected]             -     -          -            CR     2019-05-02 14:05:16       wc-failed          31.90                {"Items":[{"ProductName":"Schiuma da Barba Pour Homme","Sku":"45396","Quantity":"2","ItemCost":"12.00","ItemTotal":"24","Category":"POUR HOMME;LINEE UOMO;RASATURA"},{"ProductName":"Detergente Intimo Delicato Mamma","Sku":"38420","Quantity":"1","ItemCost":"11.00","ItemTotal":"11","Category":"POUR HOMME;LINEE UOMO;RASATURA"}]}

If you have any suggestion on how to build a better DataFrame instead of what I though, I'm glad to read about it and change my mind. As I told, I'm at the beginning and I really would appreciate any advice.

PS: I would be really glad if you can also explain the solution you are providing, because I'm actually try to learn data manipulation and it would be really helpful not only to have a solution, but also to understand it.

Thanks to everyone who gonna spend his/her time to help me!


Solution

  • Use pd.json_normalize(), as follows:

    Assuming your json file is named data:

    df = pd.json_normalize(data['Orders'])
    

    pd.json_normalize() normalizes semi-structured JSON data into a flat table. As it flattens the nested structure, you can access all fields within the JSON.

    You need to specify the first tag Orders in order to access and expand the column contents within it. Otherwise, you will only get a column Orders.

    Result:

    print(df)
    
          OrderID          Email AnnoNascita Age Gender Provincia            OrderDate   OrderStatus OrderTotal TotalDiscount OrderSubTotal Coupon OrderItems.Item.ProductName OrderItems.Item.Sku OrderItems.Item.Quantity OrderItems.Item.ItemCost OrderItems.Item.ItemTotal                           OrderItems.Item.Category                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          OrderItems.Item
    0  1000004209  [email protected]           -   -      -        CR  2019-05-02 14:05:16     wc-failed      31.90             0          31.9            Eau de Parfum Zafferano               44160                        1                    27.00                        27  ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
    1  1000004210  [email protected]           -   -      -        GE  2019-05-02 14:17:32  wc-cancelled       9.00             0             9             Sapone Marsiglia 200 g               01026                        1                     4.10                       4.1                                MARSEILLE;SAPONETTE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
    2  1000004211  [email protected]           -   -      -        GE  2019-05-02 14:21:42  wc-cancelled      31.90             0          31.9            Eau de Parfum Zafferano               44160                        1                    27.00                        27  ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      NaN
    3  1000004235  [email protected]           -   -      -        CR  2019-05-03 09:37:06  wc-cancelled      31.90             0          31.9                                NaN                 NaN                      NaN                      NaN                       NaN                                                NaN                                                                                                                                                                                                                                                                                                             [{'ProductName': 'Eau de Parfum Zafferano', 'Sku': '44160', 'Quantity': '1', 'ItemCost': '27.00', 'ItemTotal': '27', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}, {'ProductName': 'Sapone Vegetale Lavanda Officinalis Bio', 'Sku': '01049', 'Quantity': '1', 'ItemCost': '4.90', 'ItemTotal': '4.9', 'Category': 'ZAFFERANO;EAU DE PARFUM;LINEE UOMO;PROFUMI (UOMO)'}]
    4  1000004292  [email protected]           -   -      -        CR  2019-05-06 08:52:47     wc-failed      64.90             0          64.9                                NaN                 NaN                      NaN                      NaN                       NaN                                                NaN  [{'ProductName': 'Schiuma da Barba Pour Homme', 'Sku': '45396', 'Quantity': '2', 'ItemCost': '12.00', 'ItemTotal': '24', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Detergente Intimo Delicato Mamma', 'Sku': '38420', 'Quantity': '1', 'ItemCost': '11.00', 'ItemTotal': '11', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Spray per Ambiente - Preziosa', 'Sku': '44231', 'Quantity': '2', 'ItemCost': '10.00', 'ItemTotal': '20', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}, {'ProductName': 'Cuscinetti Profumati - Preziosa', 'Sku': '45491', 'Quantity': '1', 'ItemCost': '9.90', 'ItemTotal': '9.9', 'Category': 'POUR HOMME;LINEE UOMO;RASATURA'}]