I have a dataframe that has n number of columns. These contain letters, the amount of letters a column contains varies and a letter can appear in various amounts of columns. I need the code for a pandas dataframe to convert the sheet to columns starting with the letters, the rows should contain the numbers of the columns that that letter was in.
ABCDEF
ABDE. 11 1 BBCC -> 2 2 EFB. 3 3 4 4
The image describes my problem better. Thank you in advance for any help.
Use DataFrame.stack
with DataFrame.reset_index
for reshape, then DataFrame.sort_values
and aggregate list
s, last create DataFrame
by constructor with transpose:
s=df.stack().reset_index(name='a').sort_values('level_1').groupby('a')['level_1'].agg(list)
df1 = pd.DataFrame(s.tolist(), index=s.index).T
print (df1)
a a b c d e f
0 1 1 1 1 3 2
1 3 3 2 4 4 None
2 None 4 None None None None
Or use GroupBy.cumcount
for counter and reshape by DataFrame.pivot
:
df2 = df.stack().reset_index(name='a').sort_values('level_1')
df2['g'] = df2.groupby('a').cumcount()
df2 = df2.pivot('g','a','level_1')
print (df2)
a a b c d e f
g
0 1 1 1 1 3 2
1 3 3 2 4 4 NaN
2 NaN 4 NaN NaN NaN NaN
Last if necessary remove index and columns names:
df1 = df1.rename_axis(index=None)
df2 = df2.rename_axis(index=None, columns=None)