Python beginner here. I have a Pandas Dataframe where I would like to insert a row with text in the first column based off values from another column. My example of data is below:
Current Dataframe:
value | type | section |
---|---|---|
0 | brown | sect1 |
1 | green | sect1 |
0 | blue | sect2 |
0 | brown | sect3 |
0 | black | sect4 |
1 | yellow | sect4 |
2 | green | sect4 |
0 | blue | sect5 |
Desired updated Dataframe:
value | type | section |
---|---|---|
START | brown | sect1 |
1 | green | sect1 |
END | ||
START | blue | sect2 |
END | ||
START | brown | sect3 |
END | ||
START | black | sect4 |
END | ||
START | black | sect4 |
1 | yellow | sect4 |
2 | green | sect4 |
END | ||
START | blue | sect5 |
END |
Notice I would like to remove the beginning values of '0' and replace with 'START'. The 'START' and 'END' rows need to occur when the sect value changes to a new value.
I have my current Dataframe set up as seen above, but I truly am not sure how to correctly update the Dataframe to do this. Any suggestions on how to handle this problem?
Using boolean indexing and repeat
:
# identify changes of section
m = df['section'].ne(df['section'].shift(-1))
# repeat last row per group
idx = df.index.repeat(m.add(1))
out = df.loc[idx]
# clear END
# assign END to last value
# assign START to first value
m2 = idx.to_series().duplicated()
out[m2] = ''
out.loc[m2, 'value'] = 'END'
out.loc[m2.shift(fill_value=True), 'value'] = 'START'
Output:
value type section
0 START brown sect1
1 1 green sect1
1 END
2 START blue sect2
2 END
3 START brown sect3
3 END
4 START black sect4
5 1 yellow sect4
6 2 green sect4
6 END
7 START blue sect5
7 END