0
Good morning everyone.
I have the following data:
import pandas as pd
info = {
'states': [-1, -1, -1, 1, 1, -1, 0, 1, 1, 1],
'values': [34, 29, 28, 30, 35, 33, 33, 36, 40, 41] }
df = pd.DataFrame(data=info)
print(df)
>>>
states values
0 -1 34
1 -1 29
2 -1 28
3 1 30
4 1 35
5 -1 33
6 0 33
7 1 36
8 1 40
9 1 41
I need to group the data using PANDAS (and/or higher order functions) (already did the exercise using for loops), I need to group the data having the "states" column as a guide. But the grouping should not be of all the data, I only need to group the data that is neighboring... as follows:
Initial DataFrame:
states values
0 -1 34 ┐
1 -1 29 │ Group this part (states = -1)
2 -1 28 ┘
3 1 30 ┐ Group this part (states = 1)
4 1 35 ┘
5 -1 33 'Group' this part (states = -1)
6 0 33 'Group' this part (states = 0)
7 1 36 ┐
8 1 40 │ Group this part (states = 1)
9 1 41 ┘
It would result in a DataFrame, with a grouping by segments (from the "states" column) and in another column the sum of the data (from the "values" column).
Expected DataFrame:
states values
0 -1 91 (values=34+29+28)
1 1 65 (values=30+35)
2 -1 33
3 0 33
4 1 117 (values=36+40+41)
You who are more versed in these issues, perhaps you can help me perform this operation.
Thank you so much!
Identify the blocks/groups of rows using diff
and cumsum
then group the dataframe by these blocks and aggregate states
with first
and values
with sum
b = df['states'].diff().ne(0).cumsum()
df.groupby(b).agg({'states': 'first', 'values': 'sum'})
Result
states values
states
1 -1 91
2 1 65
3 -1 33
4 0 33
5 1 117