I have a set of data, which is a nested dictionary. While columns a
and b
have a single entry, column c
is composed of d
and e
, which have an arbitrary number of elements of the same length.
For instance:
N = 5
nested_dict = {
"a": np.random.randn(N),
"b": np.random.randn(N),
"c": [{"d":np.random.randn(i+1), "e":np.random.randn(i+1)} for i in range(N)]
}
How do I convert this to a multi-index Pandas Dataframe, such that column c
has subheadings d
and e
, each with the length of the array supplied?
Edit: See an example of the desired format below:
In addition, may I save and load this Dataframe as if it were a normal one?
Try something like:
import pandas as pd
import numpy as np
N = 5
nested_dict = {
"a": np.random.randn(N),
"b": np.random.randn(N),
"c": [{"d": np.random.randn(i + 1), "e": np.random.randn(i + 1)} for i in range(N)]
}
df = pd.DataFrame(data=nested_dict)
# Normalize Nested Dict and merge back
# Set index to 'a', 'b' and unpack lists
df = df.drop(columns=['c']) \
.merge(pd.json_normalize(df['c']),
left_index=True,
right_index=True) \
.set_index(['a', 'b']) \
.apply(lambda x: x.apply(pd.Series).stack())
# Add MultiIndex C back
df.columns = pd.MultiIndex.from_product([['c'], df.columns])
# For Display
print(df.to_string())
Output:
c d e a b -0.913707 1.015265 0 0.630905 -0.508003 0.467421 1.880421 0 0.886313 0.026921 1 -0.720613 1.027585 -0.314128 -0.756686 0 0.317922 -0.431624 1 -1.154708 -0.370363 2 0.400752 -0.000786 0.488310 -0.230924 0 1.303703 -1.414924 1 0.242020 1.401058 2 -0.369507 0.648304 3 1.491819 1.010083 1.248220 -0.351634 0 0.106272 0.518489 1 -1.916420 -0.068814 2 -0.090406 -0.237604 3 -0.208762 0.163396 4 0.664643 -1.272215