Search code examples
pythonpandasdataframeconditional-formatting

How to fill missing data in dataframe based on multiple conditions in the row and from data above


I have the following dataframe,

ID      LineID  TeamID  ShiftID DateTime        Production  Theoretical  Scrap
1       3       1       NULL    18/6/2020 4:00  482.5291    511.2351     
2       2       1       NULL    18/6/2020 5:00  467.8704    519.9842
3       1       1       NULL    18/6/2020 5:00  390.5945    480.2252    
2186    3       1       NULL    18/6/2020 5:00  0                        0.5
2520    2       1       NULL    18/6/2020 5:00  0                        21
2840    1       1       NULL    18/6/2020 6:00  0                        12
4       1       1       NULL    18/6/2020 6:00  389.2222    480.2252        
5       3       1       NULL    18/6/2020 6:00  516.0907    511.2351    
6       2       1       NULL    18/6/2020 6:00  450.5216    519.9842    
7       3       1       NULL    18/6/2020 6:00  397.9998    511.2351    
8       2       1       NULL    18/6/2020 7:00  456.9486    519.9842    
9       1       1       NULL    18/6/2020 7:00  414.6932    480.2252        
1939    2       1       NULL    18/6/2020 7:00  0                        24
2462    3       1       NULL    18/6/2020 7:00  0                        3
3075    1       1       NULL    18/6/2020 7:00  0                        3.5
1
......
......
......
114678  1       1       NULL    18/6/2018 22:00 343.5955        
114798  3       1       NULL    18/6/2018 22:00 191.2512        
114888  2       1       NULL    18/6/2018 22:00 190.5125        
114657  2       1       NULL    18/6/2018 22:00 414.6432        
114738  1       1       NULL    18/6/2018 22:00 429.43      
114885  3       1       NULL    18/6/2018 23:00 361.3246        
114756  1       1       NULL    18/6/2018 23:00 409.51      

I need to fill in the columns where Theoretical is empty, but only where the scrap is also empty.

So the condition is, when the LineID is 3 the theoretical is always 511.2351, when its 2 the theoretical is always 519.9842 and when its 1 its always 480.2252. But when there is a scrap value, the theoretical should be empty.

I can't seem to figure out a forward fill method this way.

I tried the following code, but all the other rows become NaN except those rows.

df['Theoretical'] = np.select([(df['LineID']==3) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
                        (df['LineID']==2) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),
                        (df['LineID']==1) & (df['Production']>0) & (df['Theoretical']==0) & (df['Scrap']==0),],
                       (511.2351,519.9842,480.2252), np.nan)

ID      LineID  TeamID  ShiftID DateTime        Production  Theoretical  Scrap
1       3       1       NULL    18/6/2020 4:00  
2       2       1       NULL    18/6/2020 5:00  
3       1       1       NULL    18/6/2020 5:00 
2186    3       1       NULL    18/6/2020 5:00  
2520    2       1       NULL    18/6/2020 5:00  
2840    1       1       NULL    18/6/2020 6:00  
4       1       1       NULL    18/6/2020 6:00      
5       3       1       NULL    18/6/2020 6:00  
6       2       1       NULL    18/6/2020 6:00     
7       3       1       NULL    18/6/2020 6:00   
8       2       1       NULL    18/6/2020 7:00    
9       1       1       NULL    18/6/2020 7:00   
1939    2       1       NULL    18/6/2020 7:00 
2462    3       1       NULL    18/6/2020 7:00  
3075    1       1       NULL    18/6/2020 7:00  
1
......
......
......
114678  1       1       NULL    18/6/2018 22:00 343.5955    480.2252    
114798  3       1       NULL    18/6/2018 22:00 191.2512    511.2351        
114888  2       1       NULL    18/6/2018 22:00 190.5125    519.9842    
114657  2       1       NULL    18/6/2018 22:00 414.6432    519.9842
114738  1       1       NULL    18/6/2018 22:00 429.43      480.2252
114885  3       1       NULL    18/6/2018 23:00 361.3246    511.2351    
114756  1       1       NULL    18/6/2018 23:00 409.51      480.2252

I need it to be like this

ID      LineID  TeamID  ShiftID DateTime        Production  Theoretical  Scrap
1       3       1       NULL    18/6/2020 4:00  482.5291    511.2351    
2       2       1       NULL    18/6/2020 5:00  467.8704    519.9842
3       1       1       NULL    18/6/2020 5:00  390.5945    480.2252    
2186    3       1       NULL    18/6/2020 5:00  0                        0.5
2520    2       1       NULL    18/6/2020 5:00  0                        21
2840    1       1       NULL    18/6/2020 6:00  0                        12
4       1       1       NULL    18/6/2020 6:00  389.2222    480.2252        
5       3       1       NULL    18/6/2020 6:00  516.0907    511.2351    
6       2       1       NULL    18/6/2020 6:00  450.5216    519.9842    
7       3       1       NULL    18/6/2020 6:00  397.9998    511.2351    
8       2       1       NULL    18/6/2020 7:00  456.9486    519.9842    
9       1       1       NULL    18/6/2020 7:00  414.6932    480.2252        
1939    2       1       NULL    18/6/2020 7:00  0                        24
2462    3       1       NULL    18/6/2020 7:00  0                        3
3075    1       1       NULL    18/6/2020 7:00  0                        3.5
1
......
......
......
114678  1       1       NULL    18/6/2018 22:00 343.5955    480.2252    
114798  3       1       NULL    18/6/2018 22:00 191.2512    511.2351        
114888  2       1       NULL    18/6/2018 22:00 190.5125    519.9842    
114657  2       1       NULL    18/6/2018 22:00 414.6432    519.9842
114738  1       1       NULL    18/6/2018 22:00 429.43      480.2252
114885  3       1       NULL    18/6/2018 23:00 361.3246    511.2351    
114756  1       1       NULL    18/6/2018 23:00 409.51      480.2252    


Solution

  • Surely it is not the best solution, but you can try something like follows

    df_new = pd.DataFrame({
        "LineID":[1, 2, 3, 1, 2, 1, 1, 2, 3, 1, 2, 1], 
        "Theoretical": [480.2252, 519.9842, 511.2351, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],
        "Scrap": [np.nan, 0.5, 21, np.nan, 24, np.nan, 40, 34,  np.nan, 0.4,  np.nan, 10]
    })
    
    df_new 
        LineID  Theoretical     Scrap
    0   1   480.2252    NaN
    1   2   519.9842    0.5
    2   3   511.2351    21.0
    3   1   NaN     NaN
    4   2   NaN     24.0
    5   1   NaN     NaN
    6   1   NaN     40.0
    7   2   NaN     34.0
    8   3   NaN     NaN
    9   1   NaN     0.4
    10  2   NaN     NaN
    11  1   NaN     10.0
    
    df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 3), "Theoretical"] = 511.2351
    df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 2), "Theoretical"] = 519.9842
    df_new.loc[(df_new["Theoretical"].isna()) & (df_new["Scrap"].isna()) & (df_new["LineID"] == 1), "Theoretical"] = 480.2252
    
    df_new
    
        LineID  Theoretical     Scrap
    0   1   480.2252    NaN
    1   2   519.9842    0.5
    2   3   511.2351    21.0
    3   1   480.2252    NaN
    4   2   NaN     24.0
    5   1   480.2252    NaN
    6   1   NaN     40.0
    7   2   NaN     34.0
    8   3   511.2351    NaN
    9   1   NaN     0.4
    10  2   519.9842    NaN
    11  1   NaN     10.0