I need a solution for the following problem. What I have is a timestamp and a value. This value can change positive, negative or remains steady. As soon as it changes positively from one row to another or stays steady, I want to add a label in a new column. If the value continues to increase, the same label should be added to the row. As soon as the value changes negatively, a zero should be entered as label. Can anyone help me?
Input Data
df_raw = pd.DataFrame(
{
"timestamp": [
"2017-06-16 05:19:18.993",
"2017-06-16 05:19:28.993",
"2017-06-16 05:19:38.993",
"2017-06-16 05:19:48.993",
"2017-06-16 05:19:58.993",
"2017-06-16 05:25:08.993",
"2017-06-16 05:25:18.993",
"2017-06-16 07:44:28.993",
"2017-06-16 07:45:38.993",
],
"signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
}
)
timestamp signalvalue
0 2017-06-16 05:19:18.993 0.0
1 2017-06-16 05:19:28.993 12.0
2 2017-06-16 05:19:38.993 22.0
3 2017-06-16 05:19:48.993 13.0
4 2017-06-16 05:19:58.993 0.0
5 2017-06-16 05:25:08.993 30.0
6 2017-06-16 05:25:18.993 0.0
7 2017-06-16 07:44:28.993 3.0
8 2017-06-16 07:45:38.993 6.0
Desired Output
timestamp signalvalue label
0 2017-06-16 05:19:18.993 0.0 0
1 2017-06-16 05:19:28.993 12.0 1
2 2017-06-16 05:19:38.993 22.0 1
3 2017-06-16 05:19:48.993 13.0 0
4 2017-06-16 05:19:58.993 0.0 0
5 2017-06-16 05:25:08.993 30.0 2
6 2017-06-16 05:25:18.993 0.0 0
7 2017-06-16 07:44:28.993 3.0 3
8 2017-06-16 07:45:38.993 6.0 3
You can compute a mask based on the diff
of the successive values, if greater than zero. Then keep only the first item of each stretch to compute a cumsum
:
m1= df_raw['signalvalue'].diff().gt(0)
df_raw['label'] = (m1&m1.ne(m1.shift())).cumsum()*m1.astype(int)
Output:
timestamp signalvalue label
0 2017-06-16 05:19:18.993 0.0 0
1 2017-06-16 05:19:28.993 12.0 1
2 2017-06-16 05:19:38.993 22.0 1
3 2017-06-16 05:19:48.993 13.0 0
4 2017-06-16 05:19:58.993 0.0 0
5 2017-06-16 05:25:08.993 30.0 2
6 2017-06-16 05:25:18.993 0.0 0
7 2017-06-16 07:44:28.993 3.0 3
8 2017-06-16 07:45:38.993 6.0 3