Date MAC SENS_1 SENS_2 SENS_3 SENS_4 SENS_N
0 2023-06-08 MAC1 79.608346 64.2030 66.181849 149.590573 12.135441
1 2023-06-09 MAC1 80.655564 64.6430 67.636943 109.034809 12.278676
2 2023-06-08 MAC2 5.323341 4.7814 57.231430 51.422297 0.494853
3 2023-06-09 MAC2 5.375561 4.7788 63.701169 88.638871 0.471265
4 2023-06-08 MAC3 3.221366 3.0242 61.071023 113.812544 0.251471
5 2023-06-09 MAC3 3.277577 3.0300 67.280632 133.127433 0.274765
6 2023-06-08 MAC4 3.029806 2.6720 63.310941 72.074345 0.393294
7 2023-06-09 MAC4 3.129871 2.6850 72.727142 100.133454 0.408941
8 2023-06-08 MAC5 18.987204 18.1676 57.660920 98.760819 1.219735
9 2023-06-09 MAC5 19.158136 18.1766 61.598102 93.546477 1.270471
10 2023-06-08 MACN 38.190817 35.2416 68.589171 160.597307 2.531353
11 2023-06-09 MACN 38.393878 34.3196 67.710298 139.854140 2.675000
For this dataframe I would like to know how many SENSORS for each MACHINE creating a HIGH-LOW situation for the latest date.
SENS_1 value of MAC1 for today (2023-06-09) is greater than yesterday's (2023-06-08) value. Also it's the same for SENS_2, SENS_3 and SENS_N
So what I need to return is a list(df) of MACHINEs with the name and numbers of SENSORS that meet the condition:
| MACHINE | SENSORS | N. of SENSORS |
|:---- |:------:| -----:|
| MAC1 | SENS_1,SENS_2,SENS_3,SENS_N | 4 |
| MAC2 | SENS_1,SENS_3,SENS_4 | 3 |
| MAC3 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N | 5 |
| MAC4 | SENS_1,SENS_2,SENS_3,SENS_4,SENS_N | 5 |
| MAC5 | SENS_1,SENS_2,SENS_3,SENS_N | 4 |
| MACN | SENS_1,SENS_N | 2 |
I can use:
high_lows = df.loc[(df['MAC1'] == df['MAC1'].shift(-1)) & (df['SENS_5'] < df['SENS_5'].shift(-1)) ,'MAC1']
to identify SENSORS with high-lows one by one. However there are almost 500 machines and 150 sensors.
So I think I need to iterate over columns by selecting SENSORS after MAC column:
df.iloc[:,2:]
and check for the condition.
However this function will run every 2000ms(2secs) so it has to be done with the fastest way possible.
What is the most efficient way of iterating over N columns for N groups and extract groups that match the condition?
Example data with NaNs:
Date MAC SENS_1 SENS_2 SENS_3 SENS_4 SENS_5
82 2023-06-08 MAC41 79.608346 NaN 66.181849 149.590573 12.135441
83 2023-06-09 MAC41 80.655564 NaN 67.636943 109.034809 12.278676
You can use:
def count(df):
m = df.iloc[:, 2:].diff().fillna(True).gt(0).all(axis=0)
return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})
out = df.groupby('MAC', as_index=False).apply(count)
Output:
>>> out
MAC Sensors Count
0 MAC1 SENS_1, SENS_2, SENS_3, SENS_N 4
1 MAC2 SENS_1, SENS_3, SENS_4 3
2 MAC3 SENS_1, SENS_2, SENS_3, SENS_4, SENS_N 5
3 MAC4 SENS_1, SENS_2, SENS_3, SENS_4, SENS_N 5
4 MAC5 SENS_1, SENS_2, SENS_3, SENS_N 4
5 MACN SENS_1, SENS_N 2
Alternative (as you have only today and yesterday rows per machine):
def count(df):
m = df.iloc[1, 2:] - df.iloc[0, 2:] > 0
return pd.Series({'Sensors': ', '.join(m.index[m]), 'Count': m.sum()})
out = df.groupby('MAC', as_index=False).apply(count)