Search code examples
pandasdataframenansubtraction

Fill NaN values based on condition in pandas


I have the below dataframe:

data = [['1', 'Construction', '', '01/02/2022', '01/06/2022', '1', 'No'], ['N/A', 'Level Site', 'Construction', '01/02/2022', '01/02/2022', '2', 'No'], ['3', 'Foundation', '', '01/03/2023', '01/06/2023', '1', 'Yes'],['2', 'Lay Foundation', 'Construction>Foundation', '01/03/2022', '01/04/2022', '3', 'No'], ['N/A', 'Prepare land for foundation', 'Construction>Foundation', '01/05/2022', '01/06/2022', '3', 'No'],['2', 'Building Envelope', '', '01/07/2023', '01/16/2023', '1', 'No'], ['2', 'Install Footings', 'Building Envelope', '01/07/2022', '01/07/2022', '2', 'Yes'], ['2', 'Pouring', '', '01/08/202', '01/09/2023', '1', 'No'],['N/A', 'Pour Foundation', 'Building Envelope>Pouring', '01/08/2022', '01/09/2022', '3', 'No'], ['2', 'Installation', '', '01/09/2022', '01/14/2022', '1', 'No']]
df1 = pd.DataFrame(data, columns=['Outline_level',  'Activity', 'Parent', 'Start', 'Finish', 'WBS Level', 'Match'])

df1

I want to fill the N/A values in the 'Outline_Level' column with the value in the cell below it minus 1. For example, if the cell holds an 'N/A' value, and the cell below has a value of 3, I want the N/A cell to have the value of 2. I have tried the bfill method, which fills the N/A values but I can't figure out how to subtract one from those values.

Here is an example of my ideal dataframe:

Ideal_data = [['1', 'Construction', '', '01/02/2022', '01/06/2022', '1', 'No'], ['2', 'Level Site', 'Construction', '01/02/2022', '01/02/2022', '2', 'No'], ['3', 'Foundation', '', '01/03/2023', '01/06/2023', '1', 'Yes'],['2', 'Lay Foundation', 'Construction>Foundation', '01/03/2022', '01/04/2022', '3', 'No'], ['1', 'Prepare land for foundation', 'Construction>Foundation', '01/05/2022', '01/06/2022', '3', 'No'],['2', 'Building Envelope', '', '01/07/2023', '01/16/2023', '1', 'No'], ['2', 'Install Footings', 'Building Envelope', '01/07/2022', '01/07/2022', '2', 'Yes'], ['2', 'Pouring', '', '01/08/202', '01/09/2023', '1', 'No'],['1', 'Pour Foundation', 'Building Envelope>Pouring', '01/08/2022', '01/09/2022', '3', 'No'], ['2', 'Installation', '', '01/09/2022', '01/14/2022', '1', 'No']]
df2 = pd.DataFrame(Ideal_data, columns=['Outline_level',  'Activity', 'Parent', 'Start', 'Finish', 'WBS Level', 'Match'])

df2

Thank you for any help


Solution

  • Replace values on recognized indices and shift values using pd.Series.shift:

    idx = df1[df1['Outline_level'].eq('N/A')].index
    df1.loc[idx, 'Outline_level'] = df1['Outline_level'].shift(-1).iloc[idx].astype(int) - 1
    

      Outline_level                     Activity                     Parent  ...      Finish WBS Level Match
    0             1                 Construction                             ...  01/06/2022         1    No
    1             2                   Level Site               Construction  ...  01/02/2022         2    No
    2             3                   Foundation                             ...  01/06/2023         1   Yes
    3             2               Lay Foundation    Construction>Foundation  ...  01/04/2022         3    No
    4             1  Prepare land for foundation    Construction>Foundation  ...  01/06/2022         3    No
    5             2            Building Envelope                             ...  01/16/2023         1    No
    6             2             Install Footings          Building Envelope  ...  01/07/2022         2   Yes
    7             2                      Pouring                             ...  01/09/2023         1    No
    8             1              Pour Foundation  Building Envelope>Pouring  ...  01/09/2022         3    No
    9             2                 Installation                             ...  01/14/2022         1    No
    
    [10 rows x 7 columns]