Search code examples
pythonpandasdataframestate-machine

Differentiating between consequential stages in a dataset


I am trying to create consequential stages for the performance of a machine that is shutting down. There are different stages that this machine has to go through in the shut down cycle. The problem is that the machine can go back in the sequence for some stages. Based on the data you cannot distinguish all stages possible, because some show the same information but based on the timeline it can be determined where the machine is in the cycle.

I created a sample dataset to give an example of the data:

import pandas as pd

data = {
  "Date and Time": ["2020-06-07 00:00", "2020-06-07 00:01", "2020-06-07 00:02", "2020-06-07 00:03", "2020-06-07 00:04", "2020-06-07 00:05", "2020-06-07 00:06", "2020-06-07 00:07", "2020-06-07 00:08", "2020-06-07 00:09", "2020-06-07 00:10", "2020-06-07 00:11", "2020-06-07 00:12", "2020-06-07 00:13", "2020-06-07 00:14", "2020-06-07 00:15", "2020-06-07 00:16", "2020-06-07 00:17", "2020-06-07 00:18", "2020-06-07 00:19", "2020-06-07 00:20", "2020-06-07 00:21", "2020-06-07 00:22", "2020-06-07 00:23", "2020-06-07 00:24", "2020-06-07 00:25", "2020-06-07 00:26", "2020-06-07 00:27", "2020-06-07 00:28", "2020-06-07 00:29"],
  "Current": [16.2, 15.1, 13.8, 12.0, 11.9, 12.1, 10.8, 9.8, 8.3, 6.2, 4.3, 4.2, 4.2, 3.3, 1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
  "Flow": [39.8, 40.3, 40.2, 40.1, 40.3, 39.8, 40.1, 40.2, 40.4, 39.6, 40, 39.3, 40.7, 38.9, 39.3, 0, 0, 39.3, 39.2, 0, 0, 38.9, 38.7, 0, 39.3, 39.2, 40.3, 0, 0, 0]
}

df = pd.DataFrame(data)

I already tried to distinguish between the phases with the following code:

# Calculate the difference between two datapoints regarding the current change
df['Current_ddt'] = ((df["Current"]) - (df["Current"].shift(1)))

# Determine which part of the shutdown the machine is in based on current and flow data
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Running' 
df.loc[(df["Current"] > 4.5) & (df["Current_ddt"] <= -1), 'progress in shutdown cycle'] = 'Ramping down'
df.loc[(df["Current"] > 4) & (df["Current"] < 4.5) & (df["Current_ddt"] > -1), 'progress in shutdown cycle'] = 'Ramp down complete between 4-4.5'
df.loc[(df["Current"] < 4.5) & (df["Current"] != 0) & (df["Current_ddt"] < -1), 'progress in shutdown cycle'] = 'Shutdown' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] == 0), 'progress in shutdown cycle'] = 'de-energized' # Not possible to go back to an earlier stage
df.loc[(df["Current"] == 0) & (df["Flow"] != 0), 'progress in shutdown cycle'] = 'flushing' #Ideally this could distinguish first, second and third flush

This part works ok until de-energized. Ultimately I would like to be able to distinguish between a normal rampdown (i.e. going to a lower production level) and a rampdown to 4.5 since I am only interested in the real shutdown of a machine since that is the time that most damage to the machine can be done if performed in the wrong way.

However, the part after de-energized is giving me the most problems. There are 3 flushing cycles, the first one is a general purge to empty the machine. The second and (optional) third flush are there to make sure the machine is clean and ready for maintenance. Based on the data there is no difference though, so I am thinking of a consequential way to distinguish between these but I do not know how to do it.

The idea output would be something like this:

Date and Time Current Flow Current_ddt Progress in shutdown cycle
2020-06-07 00:00 16.2 39.8
2020-06-07 00:01 15.1 40.3 -1.1 Ramping down
2020-06-07 00:02 13.8 40.2 -1.3 Ramping down
2020-06-07 00:03 12 40.1 -1.8 Ramping down
2020-06-07 00:04 11.9 40.3 -0.0999999999999996 Running
2020-06-07 00:05 12.1 39.8 0.199999999999999 Running
2020-06-07 00:06 10.8 40.1 -1.3 Ramping down
2020-06-07 00:07 9.8 40.2 -1 Ramping down
2020-06-07 00:08 8.3 40.4 -1.5 Ramping down
2020-06-07 00:09 6.2 39.6 -2.1 Ramping down
2020-06-07 00:10 4.3 40 -1.9 Shutdown
2020-06-07 00:11 4.2 39.3 -0.0999999999999996 Ramp down complete between 4-4.5
2020-06-07 00:12 4.2 40.7 0 Ramp down complete between 4-4.5
2020-06-07 00:13 3.3 38.9 -0.9 Shutdown
2020-06-07 00:14 1.8 39.3 -1.5 Shutdown
2020-06-07 00:15 0 0 -1.8 de-energized
2020-06-07 00:16 0 0 0 de-energized
2020-06-07 00:17 0 39.3 0 purging
2020-06-07 00:18 0 39.2 0 purging
2020-06-07 00:19 0 0 0 purged
2020-06-07 00:20 0 0 0 purged
2020-06-07 00:21 0 38.9 0 second flush
2020-06-07 00:22 0 38.7 0 second flush
2020-06-07 00:23 0 0 0 flushed
2020-06-07 00:24 0 39.3 0 third flush
2020-06-07 00:25 0 39.2 0 third flush
2020-06-07 00:26 0 40.3 0 third flush
2020-06-07 00:27 0 0 0 flushed and stopped
2020-06-07 00:28 0 0 0 flushed and stopped
2020-06-07 00:29 0 0 0 flushed and stopped

Any tips?


Solution

  • I've implemented simple state machine based on "Current" and "Flow" column:

    def state_machine():
        current_state = None
        current, flow = yield
    
        while True:
            c, flow = yield current_state
    
            current_ddt = c - current
            current = c
    
            if current > 4.5:
                if current_ddt <= -1:
                    current_state = "Ramping down"
                else:
                    current_state = "Running"
            elif current > 4:
                if current_ddt < -1:
                    current_state = "Shutdown"
                else:
                    current_state = "Ramp down complete between 4-4.5"
            elif current > 0:
                current_state = "Shutdown"
            else:
                states = iter(
                    [
                        "Purging",
                        "Purged",
                        "Second Flush",
                        "Flushed",
                        "Third Flush",
                        "Flushed and stopped",
                    ]
                )
    
                # current is == 0, check the flow:
                if flow == 0:
                    current_state = "De-energized"
                    waiting_for_zero = False
                else:
                    current_state = next(states)  # Purging
                    waiting_for_zero = True
    
                while True:
                    current, flow = yield current_state
    
                    if flow > 0 and waiting_for_zero is False:
                        current_state = next(states)
                        waiting_for_zero = True
                    elif flow == 0 and waiting_for_zero is True:
                        current_state = next(states)
                        waiting_for_zero = False
    
                    if current_state == "Flushed and stopped":
                        # We are stopped completely, don't react to changes of "current" and/or "flow"
                        while True:
                            yield current_state
    
    
    s = state_machine()
    next(s)
    
    df["Progress in shutdown cycle"] = df.apply(
        lambda x: s.send((x["Current"], x["Flow"])), axis=1
    )
    
    print(df)
    

    Prints:

           Date and Time  Current  Flow        Progress in shutdown cycle
    0   2020-06-07 00:00     16.2  39.8                              None
    1   2020-06-07 00:01     15.1  40.3                      Ramping down
    2   2020-06-07 00:02     13.8  40.2                      Ramping down
    3   2020-06-07 00:03     12.0  40.1                      Ramping down
    4   2020-06-07 00:04     11.9  40.3                           Running
    5   2020-06-07 00:05     12.1  39.8                           Running
    6   2020-06-07 00:06     10.8  40.1                      Ramping down
    7   2020-06-07 00:07      9.8  40.2                      Ramping down
    8   2020-06-07 00:08      8.3  40.4                      Ramping down
    9   2020-06-07 00:09      6.2  39.6                      Ramping down
    10  2020-06-07 00:10      4.3  40.0                          Shutdown
    11  2020-06-07 00:11      4.2  39.3  Ramp down complete between 4-4.5
    12  2020-06-07 00:12      4.2  40.7  Ramp down complete between 4-4.5
    13  2020-06-07 00:13      3.3  38.9                          Shutdown
    14  2020-06-07 00:14      1.8  39.3                          Shutdown
    15  2020-06-07 00:15      0.0   0.0                      De-energized
    16  2020-06-07 00:16      0.0   0.0                      De-energized
    17  2020-06-07 00:17      0.0  39.3                           Purging
    18  2020-06-07 00:18      0.0  39.2                           Purging
    19  2020-06-07 00:19      0.0   0.0                            Purged
    20  2020-06-07 00:20      0.0   0.0                            Purged
    21  2020-06-07 00:21      0.0  38.9                      Second Flush
    22  2020-06-07 00:22      0.0  38.7                      Second Flush
    23  2020-06-07 00:23      0.0   0.0                           Flushed
    24  2020-06-07 00:24      0.0  39.3                       Third Flush
    25  2020-06-07 00:25      0.0  39.2                       Third Flush
    26  2020-06-07 00:26      0.0  40.3                       Third Flush
    27  2020-06-07 00:27      0.0   0.0               Flushed and stopped
    28  2020-06-07 00:28      0.0   0.0               Flushed and stopped
    29  2020-06-07 00:29      0.0   0.0               Flushed and stopped