Search code examples
pythonpandasdataframegspread

Pandas Dataframe - (Column re structure)


I have a dataframe that has n number of columns. These contain letters, the amount of letters a column contains varies and a letter can appear in various amounts of columns. I need the code for a pandas dataframe to convert the sheet to columns starting with the letters, the rows should contain the numbers of the columns that that letter was in.

Link to example problem

enter image description here

  1.   ABCDEF
    

ABDE. 11 1 BBCC -> 2 2 EFB. 3 3 4 4

The image describes my problem better. Thank you in advance for any help.


Solution

  • Use DataFrame.stack with DataFrame.reset_index for reshape, then DataFrame.sort_values and aggregate lists, last create DataFrame by constructor with transpose:

    s=df.stack().reset_index(name='a').sort_values('level_1').groupby('a')['level_1'].agg(list)
    
    df1 = pd.DataFrame(s.tolist(), index=s.index).T
    print (df1)
    a     a  b     c     d     e     f
    0     1  1     1     1     3     2
    1     3  3     2     4     4  None
    2  None  4  None  None  None  None
    

    Or use GroupBy.cumcount for counter and reshape by DataFrame.pivot:

    df2 = df.stack().reset_index(name='a').sort_values('level_1')
    
    df2['g'] = df2.groupby('a').cumcount()
    
    df2 = df2.pivot('g','a','level_1')
    print (df2)
    a    a  b    c    d    e    f
    g                            
    0    1  1    1    1    3    2
    1    3  3    2    4    4  NaN
    2  NaN  4  NaN  NaN  NaN  NaN
    

    Last if necessary remove index and columns names:

    df1 = df1.rename_axis(index=None)
    df2 = df2.rename_axis(index=None, columns=None)