I have a data set that has 4 columns in it. One with random numbers, one with 1s and 0s, another that breaks it into segments and one that sees if the value is different than the one before it.
Ex Data Set:
Column1 Column2 Column3 Column4
10 1 1 1
10 1 1 0
14 1 1 1
14 1 1 0
16 0 0 1
17 1 2 1
17 1 2 0
19 1 2 1
20 0 0 1
24 0 0 1
25 1 3 1
25 1 3 0
25 1 3 0
29 1 3 1
30 1 3 1
30 1 3 0
I am trying to get the first and last value of column1 inside of each segment based off of the criteria of column 4. So inside of the segment, the first-time column 4 has a 1, I want that value returned and the last time column 4 has a 1, I want that value returned as well.
What I want returned:
Column3 Column45(first) Column6 (last)
1 10 14
2 17 19
3 25 30
What I have tried (does not work):
firstlast= (df.assign(fl=df['Column1'].abs()).groupby([['Column3' , 'Column4']]).size()['Column1'].agg([('min' , 'min'), ('max', 'max')]).add_prefix('Column1'))
print (firstlast)
Still trying to get use to Python so I am open to new ways of trying to achieve this!
Try this:
df_filtered = df[df['Column4'] == 1]
firstlast = df_filtered.groupby('Column3')['Column1'].agg([('Column5', 'first'), ('Column6', 'last')])