I have some large text files that I want to bin into equal intervals, and then redefine one column as the sum of rows in each bin, and the redefine a second column as the index for each bin. Here is a simplified version of my dataset:
time RN NOR
0 100 0 0
1 101 0 0
2 104 0 0
3 105 0 0
4 107 0 0
5 110 0 0
6 114 0 0
7 115 0 0
8 116 0 0
9 118 0 0
Specifically, I want to bin the time column (which is in seconds) into bins every 5 seconds, for example. Then I want to calculate the NR column to be the number of returns for each bin, and the NR column to be the chronological return number for each individual bin. The output should look like this:
time RN NR bin
0 100 1 3 1
1 101 2 3 1
2 104 3 3 1
3 105 1 2 2
4 107 2 2 2
5 110 1 2 3
6 114 2 2 3
7 115 1 3 4
8 116 2 3 4
9 118 3 3 4
Is there a way to do this with python?
I have tried following this example here to create bins but I am stumped about how to add the row count and sum of rows to my binned dataset.
Create bin
column by subtract first minimal value with Series.floordiv
and add 1
, for RN
use counter by GroupBy.cumcount
and for NOR
mapping by Series.value_counts
:
df['bin'] = df['time'].sub(df['time'].min()).floordiv(5).add(1)
df['RN'] = df.groupby('bin').cumcount().add(1)
df['NOR'] = df['bin'].map(df['bin'].value_counts())
print (df)
time RN NOR bin
0 100 1 3 1
1 101 2 3 1
2 104 3 3 1
3 105 1 2 2
4 107 2 2 2
5 110 1 2 3
6 114 2 2 3
7 115 1 3 4
8 116 2 3 4
9 118 3 3 4