Search code examples
pandasdataframestackstartswithpandas-melt

Split column into multiple columns when a row starts with a string


I am wanting to split a single column, in a dataframe, into multiple columns whenever a row in the column startswith a specific string. I have a dataframe with a single column like here:

import pandas as pd
lst = ['Group 123 nv-1', 'a, v', 's,b', 'y, i', 'Group  123 mt-d2', 'b, v',
                         'Group  123 id-01', 'n,m', 'x, y', 'z, m', 'l,b']
df = pd.DataFrame(lst, columns=['Group'])
print(df)


              Group
0    Group 123 nv-1
1              a, v
2               s,b
3              y, i
4   Group 123 mt-d2
5              b, v
6   Group 123 id-01
7               n,m
8              x, y
9              z, m
10              l,b

I want to split the Group column into new columns each time that a string starts with "Group 123":

Group 123 nv-1  Group 123 mt-d2  Group 123 id-01
a, v            b, v             n,m
s,b                              x, y
y, i                             z, m
                                 l,b

Solution

  • try this:

    pd.concat([sub.reset_index(drop=True) for _, sub in df.groupby(
        df.Group.str.contains(r'^Group\s+123').cumsum())], axis=1)
    >>>
    
        Group           Group           Group
    0   Group 123 nv-1  Group 123 mt-d2 Group 123 id-01
    1   a, v            b, v            n,m
    2   s,b             NaN             x, y
    3   y, i            NaN             z, m
    4   NaN             NaN             l,b