Search code examples
pythonpandaslabel

Label row when value changes pandas


I need a solution for the following problem. What I have is a timestamp and a value. This value can change positive, negative or remains steady. As soon as it changes positively from one row to another or stays steady, I want to add a label in a new column. If the value continues to increase, the same label should be added to the row. As soon as the value changes negatively, a zero should be entered as label. Can anyone help me?

Input Data

df_raw = pd.DataFrame(
    {
        "timestamp": [
            "2017-06-16 05:19:18.993",
            "2017-06-16 05:19:28.993",
            "2017-06-16 05:19:38.993",
            "2017-06-16 05:19:48.993",
            "2017-06-16 05:19:58.993",
            "2017-06-16 05:25:08.993",
            "2017-06-16 05:25:18.993",
            "2017-06-16 07:44:28.993",
            "2017-06-16 07:45:38.993",
        ],
        "signalvalue": [0.0, 12.0, 22.0, 13.0, 0.0, 30.0, 0.0, 3.0, 6.0],
    }
)

    timestamp                signalvalue
0   2017-06-16 05:19:18.993  0.0
1   2017-06-16 05:19:28.993  12.0
2   2017-06-16 05:19:38.993  22.0
3   2017-06-16 05:19:48.993  13.0
4   2017-06-16 05:19:58.993  0.0
5   2017-06-16 05:25:08.993  30.0
6   2017-06-16 05:25:18.993  0.0
7   2017-06-16 07:44:28.993  3.0
8   2017-06-16 07:45:38.993  6.0

Desired Output

    timestamp                signalvalue    label
0   2017-06-16 05:19:18.993  0.0            0
1   2017-06-16 05:19:28.993  12.0           1
2   2017-06-16 05:19:38.993  22.0           1
3   2017-06-16 05:19:48.993  13.0           0
4   2017-06-16 05:19:58.993  0.0            0
5   2017-06-16 05:25:08.993  30.0           2
6   2017-06-16 05:25:18.993  0.0            0
7   2017-06-16 07:44:28.993  3.0            3
8   2017-06-16 07:45:38.993  6.0            3

Solution

  • You can compute a mask based on the diff of the successive values, if greater than zero. Then keep only the first item of each stretch to compute a cumsum:

    m1= df_raw['signalvalue'].diff().gt(0)
    
    df_raw['label'] = (m1&m1.ne(m1.shift())).cumsum()*m1.astype(int)
    

    Output:

                     timestamp  signalvalue  label
    0  2017-06-16 05:19:18.993          0.0      0
    1  2017-06-16 05:19:28.993         12.0      1
    2  2017-06-16 05:19:38.993         22.0      1
    3  2017-06-16 05:19:48.993         13.0      0
    4  2017-06-16 05:19:58.993          0.0      0
    5  2017-06-16 05:25:08.993         30.0      2
    6  2017-06-16 05:25:18.993          0.0      0
    7  2017-06-16 07:44:28.993          3.0      3
    8  2017-06-16 07:45:38.993          6.0      3