Search code examples
pythonpandasdataframemulti-index

Pandas Dataframe


I have a set of data, which is a nested dictionary. While columns a and b have a single entry, column c is composed of d and e, which have an arbitrary number of elements of the same length.

For instance:

N = 5
nested_dict = {
"a": np.random.randn(N),
"b": np.random.randn(N),
"c": [{"d":np.random.randn(i+1), "e":np.random.randn(i+1)} for i in range(N)]
}

How do I convert this to a multi-index Pandas Dataframe, such that column c has subheadings d and e, each with the length of the array supplied?

Edit: See an example of the desired format below:

An Example Dataframe

In addition, may I save and load this Dataframe as if it were a normal one?


Solution

  • Try something like:

    import pandas as pd
    import numpy as np
    
    N = 5
    nested_dict = {
        "a": np.random.randn(N),
        "b": np.random.randn(N),
        "c": [{"d": np.random.randn(i + 1), "e": np.random.randn(i + 1)} for i in range(N)]
    }
    
    df = pd.DataFrame(data=nested_dict)
    # Normalize Nested Dict and merge back
    # Set index to 'a', 'b' and unpack lists
    df = df.drop(columns=['c']) \
        .merge(pd.json_normalize(df['c']),
               left_index=True,
               right_index=True) \
        .set_index(['a', 'b']) \
        .apply(lambda x: x.apply(pd.Series).stack())
    
    # Add MultiIndex C back
    df.columns = pd.MultiIndex.from_product([['c'], df.columns])
    
    # For Display
    print(df.to_string())
    

    Output:

                                  c          
                                  d         e
    a         b                              
    -0.913707  1.015265 0  0.630905 -0.508003
     0.467421  1.880421 0  0.886313  0.026921
                        1 -0.720613  1.027585
    -0.314128 -0.756686 0  0.317922 -0.431624
                        1 -1.154708 -0.370363
                        2  0.400752 -0.000786
     0.488310 -0.230924 0  1.303703 -1.414924
                        1  0.242020  1.401058
                        2 -0.369507  0.648304
                        3  1.491819  1.010083
     1.248220 -0.351634 0  0.106272  0.518489
                        1 -1.916420 -0.068814
                        2 -0.090406 -0.237604
                        3 -0.208762  0.163396
                        4  0.664643 -1.272215