I am trying to normalize a column from a Pandas dataframe that is a list of dictionaries (can be missing).
Example to reproduce
import pandas as pd
bids = pd.Series([[{'price': 606, 'quantity': 28},{'price': 588, 'quantity': 29},
{'price': 513, 'quantity': 33}],[],[{'price': 7143, 'quantity': 15},
{'price': 68, 'quantity': 91},{'price': 6849, 'quantity': 12}]])
data = pd.DataFrame([1,2,3]).rename(columns={0:'id'})
data['bids'] = bids
Desired output
id price quantity
1 606 28
1 588 29
1 513 33
3 7143 15
3 68 91
3 6849 12
Attempt
Trying to resolve using pandas json_normalize, following docs here. I'm confused by why none of the below work, and what type of record_path will fix my problem. All the below error.
pd.json_normalize(data['bids'])
pd.json_normalize(data['bids'],['price','quantity'])
pd.json_normalize(data['bids'],[['price','quantity']])
Adding another approach with np.repeat
and np.concatenate
with json_normalize
out = pd.io.json.json_normalize(np.concatenate(data['bids']))
out.insert(0,'id',np.repeat(data['id'],data['bids'].str.len()).to_numpy())
Or you can also use np.hstack
as @Shubham mentions instead of np.concatenate
:
out = pd.io.json.json_normalize(np.hstack(data['bids']))
print(out)
id price quantity
0 1 606 28
1 1 588 29
2 1 513 33
3 3 7143 15
4 3 68 91
5 3 6849 12