I have tow columns in my data frame that have multiple dictionarise in it and I want to expand it into multiple columns but the problem when I use explode series it mismatch
example:
Column A | Column B | Columns C |
---|---|---|
Cell 1 | {"a":0.5} {"b":0.5} |
{"d1":0.25}{"d2":0.25}{"d3":0.25}{"d4":0.25} |
Cell 2 | {"c":1.0} |
{"t1":0.5} {"t2":0.5} |
desirable output:
Column A | Column B1 | Columns B2 | Columns C1 | Columns C2 |
---|---|---|---|---|
Cell 1 | "a" | 0.5 | "d1" | 0.25 |
Cell 1 | "a" | 0.5 | "d2" | 0.25 |
Cell 1 | "b" | 0.5 | "d3" | 0.25 |
Cell 1 | "b" | 0.5 | "d4" | 0.25 |
Cell 2 | "c" | 1.0 | "t1" | 0.5 |
Cell 2 | "c" | 1.0 | "t2" | 0.5 |
i used to use
df1 = (
df.assign(Columns B2 =lambda df: df.Columns B.apply(lambda x: x.values()))
.reset_index()
.apply(pd.Series.explode)
)
output = (
df1.assign(Columns C2 =lambda df: df.Columns C.apply(lambda x: x.values()))
.reset_index()
.apply(pd.Series.explode)
)
the problem is that it mismatches the rows something in column C1 supposed to be in row 2 go to 3
Assuming this input and that Column B has the shortest dictionaries:
df = pd.DataFrame({'Column A': ['Cell 1', 'Cell 2'],
'Column B': [[{'a': 0.5}, {'b': 0.5}], [{'c': 1.0}]],
'Column C': [[{'d1': 0.25}, {'d2': 0.25}, {'d3': 0.25}, {'d4': 0.25}], [{'t1': 0.5}, {'t2': 0.5}]]})
You can try to repeat
the shortest list, then convert your key/value pairs as new DataFrames:
tmp = pd.DataFrame([[np.repeat(a, int(np.ceil(len(b)/len(a))))[:len(b)], b]
for a, b in zip(df['Column B'], df['Column C'])],
columns=['Column B', 'Column C']
).explode(['Column B', 'Column C'])
def convert(col):
s = col.apply(lambda x: list(x.items())[0])
return (pd.DataFrame(s.tolist(), index=s.index)
.rename(columns=lambda x: f'{col.name}{x+1}')
)
out = (df.drop(columns=['Column B', 'Column C'])
.join(pd.concat([convert(tmp[c]) for c in list(tmp)], axis=1))
)
Output:
Column A Column B1 Column B2 Column C1 Column C2
0 Cell 1 a 0.5 d1 0.25
0 Cell 1 a 0.5 d2 0.25
0 Cell 1 b 0.5 d3 0.25
0 Cell 1 b 0.5 d4 0.25
1 Cell 2 c 1.0 t1 0.50
1 Cell 2 c 1.0 t2 0.50