Search code examples
pythonpandasdataframegroup-byrow

Inserting row of text before and after when a column's text changes in a Pandas Dataframe, in python


Python beginner here. I have a Pandas Dataframe where I would like to insert a row with text in the first column based off values from another column. My example of data is below:

Current Dataframe:

value type section
0 brown sect1
1 green sect1
0 blue sect2
0 brown sect3
0 black sect4
1 yellow sect4
2 green sect4
0 blue sect5

Desired updated Dataframe:

value type section
START brown sect1
1 green sect1
END
START blue sect2
END
START brown sect3
END
START black sect4
END
START black sect4
1 yellow sect4
2 green sect4
END
START blue sect5
END

Notice I would like to remove the beginning values of '0' and replace with 'START'. The 'START' and 'END' rows need to occur when the sect value changes to a new value.

I have my current Dataframe set up as seen above, but I truly am not sure how to correctly update the Dataframe to do this. Any suggestions on how to handle this problem?


Solution

  • Using boolean indexing and repeat:

    # identify changes of section
    m = df['section'].ne(df['section'].shift(-1))
    
    # repeat last row per group
    idx = df.index.repeat(m.add(1))
    out = df.loc[idx]
    
    # clear END
    # assign END to last value
    # assign START to first value
    m2 = idx.to_series().duplicated()
    out[m2] = ''
    out.loc[m2, 'value'] = 'END'
    out.loc[m2.shift(fill_value=True), 'value'] = 'START'
    

    Output:

       value    type section
    0  START   brown   sect1
    1      1   green   sect1
    1    END                
    2  START    blue   sect2
    2    END                
    3  START   brown   sect3
    3    END                
    4  START   black   sect4
    5      1  yellow   sect4
    6      2   green   sect4
    6    END                
    7  START    blue   sect5
    7    END