I have the following dataframe:
import numpy as np
import pandas as pd
df = pd.DataFrame([])
df['Date'] = ['2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20']
df['Machine'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A']
df['Signal'] = [0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1]
df['Status'] = 0
And the following function which generates a 'Status' column for the machine A. In the Signal col, 1 switches the machine on (Status col 1) which remains 1 until the machine receives either 2 or 3 which are signals to switch the machine status to 0 (off) until the machine receives Signal 1 again.
I've solved the issue of maintaining the previous Status row value of 1 or 0 with the below function:
def s_gen(dataset, Signal):
_status = 0
status0 = []
for (i) in Signal:
if _status == 0:
if i == 1:
_status = 1
elif _status == 1:
if (i == 2 or i==3):
_status = 0
status0.append(_status)
dataset['status0'] = status0
return dataset['status0']
df['Status'] = s_gen(df,df['Signal'])
df.drop('status0',axis=1,inplace = True)
df
This appends the newly created column to the dataframe. However I have a larger dataframe with many different values in the Machine column (grouped as series; A,A,A,B,B,B etc) and the results of the function cannot overlap. Using groupby didn't work. So I think the next step is to produce each sequence of 'Status' as a separate list and concatenate them before appending the whole series to the larger dataframe as part of a larger outer loop.
This is the desired outcome:
df = pd.DataFrame([])
df['Date'] = ['2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20',
'2020-01-01','2020-01-02','2020-01-03','2020-01-04','2020-01-05',
'2020-01-06','2020-01-07','2020-01-08','2020-01-09','2020-01-10',
'2020-01-11','2020-01-12','2020-01-13','2020-01-14','2020-01-15',
'2020-01-16','2020-01-17','2020-01-18','2020-01-19','2020-01-20']
df['Machine'] = ['A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A',
'B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B','B',]
df['Signal'] = [0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1,0,1,2,0,1,3,0,0,0,3,0,1,0,0,3,0,1,0,0,1]
df['Status'] = [0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1,0,1,0,0,1,0,0,0,0,0,0,1,1,1,0,0,1,1,1,1]
df
What I'm struggling with is, if the function processes each machine's data separately then appends it to the dataframe, it would have to loop through each machine, then concatenate all the Status series produced, then append that larger series to the dataframe.
This is what I've tried so far:
dfList = df[df['Machine']]
dfListU = pd.DataFrame([])
dfListU = dfList['Machine'].unique()
dfListU.flatten()
def s_gen2(item, dataset, Signal):
data = df[df.Machine==m]
for m in dfListU:
_status = 0
status0 = []
for (i) in Signal:
if _status == 0:
if i == 1:
_status = 1
elif _status == 1:
if (i == 2 or i==3):
_status = 0
#status0.append(_status)
dataset['status0'] = status0
return dataset['status0']
for i in dfListU:
df1 = pd.concat(i)
status0.append(_status)
df['Status'] = s_gen(df,df['Signal'])
df.drop('status0',axis=1,inplace = True)
df
Which results in the error - KeyError: "None of [Index(['A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A', 'A',\n 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B',\n 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B', 'B'],\n dtype='object')] are in the [columns]"
Is is better to loop the function through the dfListU (list of unique machines) then concatenate the outcome? I've tried to avoid using loops but couldn't find any other way to compare the previous status row with the same row in the Signal column.
Any help is sincerely appreciated.
A simple approach would be to map
the known statuses, then to groupby.ffill
them:
df['Status'] = (df['Signal']
.map({1:1, 2:0, 3:0})
.groupby(df['Machine']).ffill()
.fillna(0, downcast='infer')
)
Output:
Date Machine Signal Status
0 2020-01-01 A 0 0
1 2020-01-02 A 1 1
2 2020-01-03 A 2 0
3 2020-01-04 A 0 0
4 2020-01-05 A 1 1
5 2020-01-06 A 3 0
6 2020-01-07 A 0 0
7 2020-01-08 A 0 0
8 2020-01-09 A 0 0
9 2020-01-10 A 3 0
10 2020-01-11 A 0 0
11 2020-01-12 A 1 1
12 2020-01-13 A 0 1
13 2020-01-14 A 0 1
14 2020-01-15 A 3 0
15 2020-01-16 A 0 0
16 2020-01-17 A 1 1
17 2020-01-18 A 0 1
18 2020-01-19 A 0 1
19 2020-01-20 A 1 1
20 2020-01-01 B 0 0
21 2020-01-02 B 1 1
22 2020-01-03 B 2 0
23 2020-01-04 B 0 0
24 2020-01-05 B 1 1
25 2020-01-06 B 3 0
26 2020-01-07 B 0 0
27 2020-01-08 B 0 0
28 2020-01-09 B 0 0
29 2020-01-10 B 3 0
30 2020-01-11 B 0 0
31 2020-01-12 B 1 1
32 2020-01-13 B 0 1
33 2020-01-14 B 0 1
34 2020-01-15 B 3 0
35 2020-01-16 B 0 0
36 2020-01-17 B 1 1
37 2020-01-18 B 0 1
38 2020-01-19 B 0 1
39 2020-01-20 B 1 1