python pandas dataframe data-science data-analysis

have two dictionary columns with different lengths want to use series explode but it mismatch

I have tow columns in my data frame that have multiple dictionarise in it and I want to expand it into multiple columns but the problem when I use explode series it mismatch

example:

Column A	Column B	Columns C
Cell 1	`{"a":0.5} {"b":0.5}`	`{"d1":0.25}{"d2":0.25}{"d3":0.25}{"d4":0.25}`
Cell 2	`{"c":1.0}`	`{"t1":0.5} {"t2":0.5}`

desirable output:

Column A	Column B1	Columns B2	Columns C1	Columns C2
Cell 1	"a"	0.5	"d1"	0.25
Cell 1	"a"	0.5	"d2"	0.25
Cell 1	"b"	0.5	"d3"	0.25
Cell 1	"b"	0.5	"d4"	0.25
Cell 2	"c"	1.0	"t1"	0.5
Cell 2	"c"	1.0	"t2"	0.5

i used to use

df1 = ( 
       df.assign(Columns B2 =lambda df: df.Columns B.apply(lambda x: x.values()))    
     
       .reset_index()
       .apply(pd.Series.explode)
     
            )
output = ( 
       df1.assign(Columns C2 =lambda df: df.Columns C.apply(lambda x: x.values()))    
     
       .reset_index()
       .apply(pd.Series.explode)
     
            )

the problem is that it mismatches the rows something in column C1 supposed to be in row 2 go to 3

Solution

Assuming this input and that Column B has the shortest dictionaries:

df = pd.DataFrame({'Column A': ['Cell 1', 'Cell 2'],
                   'Column B': [[{'a': 0.5}, {'b': 0.5}], [{'c': 1.0}]],
                   'Column C': [[{'d1': 0.25}, {'d2': 0.25}, {'d3': 0.25}, {'d4': 0.25}], [{'t1': 0.5}, {'t2': 0.5}]]})

You can try to repeat the shortest list, then convert your key/value pairs as new DataFrames:

tmp = pd.DataFrame([[np.repeat(a, int(np.ceil(len(b)/len(a))))[:len(b)], b]
                    for a, b in zip(df['Column B'], df['Column C'])],
                   columns=['Column B', 'Column C']
                   ).explode(['Column B', 'Column C'])

def convert(col):
    s = col.apply(lambda x: list(x.items())[0])
    return (pd.DataFrame(s.tolist(), index=s.index)
              .rename(columns=lambda x: f'{col.name}{x+1}')
            )

out = (df.drop(columns=['Column B', 'Column C'])
         .join(pd.concat([convert(tmp[c]) for c in list(tmp)], axis=1))
      )

Output:

  Column A Column B1  Column B2 Column C1  Column C2
0   Cell 1         a        0.5        d1       0.25
0   Cell 1         a        0.5        d2       0.25
0   Cell 1         b        0.5        d3       0.25
0   Cell 1         b        0.5        d4       0.25
1   Cell 2         c        1.0        t1       0.50
1   Cell 2         c        1.0        t2       0.50