I have the following dataframe,
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955
114798 3 1 NULL 18/6/2018 22:00 191.2512
114888 2 1 NULL 18/6/2018 22:00 190.5125
114657 2 1 NULL 18/6/2018 22:00 414.6432
114738 1 1 NULL 18/6/2018 22:00 429.43
114885 3 1 NULL 18/6/2018 23:00 361.3246
114756 1 1 NULL 18/6/2018 23:00 409.51
I need to fill in the columns where Theoretical is empty, but only where the scrap is also empty.
So the condition is, when the LineID is 3
the theoretical is always 511.2351
, when its 2
the theoretical is always 519.9842 and when its 1
its always 480.2252. But when there is a scrap value, the theoretical should be empty.
I can't seem to figure out a forward fill method this way.
I tried the following code, but all the other rows become NaN except those rows.
df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
(df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
(511.2351,519.9842,480.2252), np.nan)
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00
2 2 1 NULL 18/6/2020 5:00
3 1 1 NULL 18/6/2020 5:00
2186 3 1 NULL 18/6/2020 5:00
2520 2 1 NULL 18/6/2020 5:00
2840 1 1 NULL 18/6/2020 6:00
4 1 1 NULL 18/6/2020 6:00
5 3 1 NULL 18/6/2020 6:00
6 2 1 NULL 18/6/2020 6:00
7 3 1 NULL 18/6/2020 6:00
8 2 1 NULL 18/6/2020 7:00
9 1 1 NULL 18/6/2020 7:00
1939 2 1 NULL 18/6/2020 7:00
2462 3 1 NULL 18/6/2020 7:00
3075 1 1 NULL 18/6/2020 7:00
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
I need it to be like this
ID LineID TeamID ShiftID DateTime Production Theoretical Scrap
1 3 1 NULL 18/6/2020 4:00 482.5291 511.2351
2 2 1 NULL 18/6/2020 5:00 467.8704 519.9842
3 1 1 NULL 18/6/2020 5:00 390.5945 480.2252
2186 3 1 NULL 18/6/2020 5:00 0 0.5
2520 2 1 NULL 18/6/2020 5:00 0 21
2840 1 1 NULL 18/6/2020 6:00 0 12
4 1 1 NULL 18/6/2020 6:00 389.2222 480.2252
5 3 1 NULL 18/6/2020 6:00 516.0907 511.2351
6 2 1 NULL 18/6/2020 6:00 450.5216 519.9842
7 3 1 NULL 18/6/2020 6:00 397.9998 511.2351
8 2 1 NULL 18/6/2020 7:00 456.9486 519.9842
9 1 1 NULL 18/6/2020 7:00 414.6932 480.2252
1939 2 1 NULL 18/6/2020 7:00 0 24
2462 3 1 NULL 18/6/2020 7:00 0 3
3075 1 1 NULL 18/6/2020 7:00 0 3.5
1
......
......
......
114678 1 1 NULL 18/6/2018 22:00 343.5955 480.2252
114798 3 1 NULL 18/6/2018 22:00 191.2512 511.2351
114888 2 1 NULL 18/6/2018 22:00 190.5125 519.9842
114657 2 1 NULL 18/6/2018 22:00 414.6432 519.9842
114738 1 1 NULL 18/6/2018 22:00 429.43 480.2252
114885 3 1 NULL 18/6/2018 23:00 361.3246 511.2351
114756 1 1 NULL 18/6/2018 23:00 409.51 480.2252
Surely it is not the best solution, but you can try something like follows
df_new = pd.DataFrame({
"LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1],
"Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
"Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34, np.nan, 0.4, np.nan, 10]
})
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 NaN NaN
4 2 NaN 24.0
5 1 NaN NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 NaN NaN
9 1 NaN 0.4
10 2 NaN NaN
11 1 NaN 10.0
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
df_new
LineID Theoretical Scrap
0 1 480.2252 NaN
1 2 519.9842 0.5
2 3 511.2351 21.0
3 1 480.2252 NaN
4 2 NaN 24.0
5 1 480.2252 NaN
6 1 NaN 40.0
7 2 NaN 34.0
8 3 511.2351 NaN
9 1 NaN 0.4
10 2 519.9842 NaN
11 1 NaN 10.0