Search code examples
pythonpandasdataframedictionarymulti-index

How to convert multiple dictionary keys in a Pandas Series to columns in a DataFrame?


I have the following pandas DataFrame with 2 columns: Address and Transactions.

    Address                                     Transactions
0   0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367  [{'type': 'outflow', 'amount': '250,000 VSO'}, {'type': inflow, 'amount': 100,000}]
1   0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  [{'type': 'inflow', 'amount': '9.1283802424254'}, {'type': inflow, 'amount': 100,000}]
2   0x5852346d9dC3d64d81dc82fdddd5Cc1211157cD5  [{'type': 'outflow', 'amount': '7,200 VSO'}, {'type': inflow, 'amount': 100,000}]

Each Address has multiple transactions, All transactions of an address are represented by a list containing one dictionary per transaction.

Each dictionary has two keys and two values: type and amount, respectively.

The code that creates the table above is below:

df_dict = pd.DataFrame(dict_all_txs_all_addresses.items(), columns=['Address', 'Transactions'])

What I want to do:
I want to create a multi-index (maybe unnecessary?) table that would look sort of like this:

    Address                                         Type                             Amount
    0   0x88aDa02f6fCE2F1A835567B4999D62a7ebb70367  outflow                          250,000 VSO
                                                    inflow                           100,000 VSO

    1   0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  inflow                           330,000 VSO
                                                    inflow                           150,000 VSO'

It shows each transaction in a different row, while maintining only one address. Note that this model table has 3 columns.

Maybe this could be solved using df.groupby() instead of a multi-index df?

Here is an example of a dictionary, for easier reading and manipulation:

dict_all_txs_all_addresses = {
        "0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d": [
            {
                "amount": "330,000 VSO",
                "type": "inflow"
            },
            {
                "amount": "150,000 VSO",
                "type": "inflow"
            }
        ],
        "0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367": [
            {
                "amount": "250,000 VSO",
                "type": "outflow"
            },
            {
                "amount": "100,000 VSO",
                "type": "inflow"
            }
        ]
    }

Solution

  • We can use pd.json_normalize here to get a tidy format which is workable:

    df = df.explode("Transactions", ignore_index=True)
    df = pd.concat([df, pd.json_normalize(df.pop("Transactions"))], axis=1)
    
                                          Address       amount     type
    0  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  330,000 VSO   inflow
    1  0x00979Bd14bD5Eb5c424c5478d3BF4b6E9212bA7d  150,000 VSO   inflow
    2  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  250,000 VSO  outflow
    3  0x88aDa02f6fCE2F1A833cd9B4999D62a7ebb70367  100,000 VSO   inflow