Search code examples
pythonpandasdataframesklearn-pandas

Dataframe applying function to rows with specific condition


Here is a sample from my dataframe:

id      DPT_DATE  TRANCHE_NO  TRAIN_NO  J_X  RES_HOLD_IND
0     2017-04-01       330.0    1234.0 -1.0         100.0
1     2017-04-01       330.0    1234.0  0.0          80.0
2     2017-04-02       331.0    1235.0 -1.0          91.0
3     2017-04-02       331.0    1235.0  0.0          83.0
4     2017-04-03       332.0    1236.0 -1.0          92.0
5     2017-04-03       332.0    1236.0  0.0          81.0
6     2017-04-04       333.0    1237.0 -1.0          87.0
7     2017-04-04       333.0    1237.0  0.0          70.0
8     2017-04-05       334.0    1238.0 -1.0          93.0
9     2017-04-05       334.0    1238.0  0.0          90.0
10    2017-04-06       335.0    1239.0 -1.0          89.0
11    2017-04-06       335.0    1239.0  0.0          85.0
12    2017-04-07       336.0    1240.0 -1.0          82.0
13    2017-04-07       336.0    1240.0  0.0          76.0

This is a dataframe for Trains' reservation, DPT_DATE= date of departure TRAIN_NO= number of train J_X= days before departure (J_X=0.0 means the day of departure, J_X=-1 means day after departure) and RES_HOLD_IND is the reservation hold that day

I want to create a new column so for each DPT_DATE and TRAIN_NO gives me the RES_HOLD_IND for the day J_X=-1

Example (I want this):

id      DPT_DATE  TRANCHE_NO  TRAIN_NO  J_X  RES_HOLD_IND  RES_J-1
0     2017-04-01       330.0    1234.0 -1.0         100.0  100.0
1     2017-04-01       330.0    1234.0  0.0          80.0  100.0
2     2017-04-02       331.0    1235.0 -1.0          91.0  91.0
3     2017-04-02       331.0    1235.0  0.0          83.0  91.0
4     2017-04-03       332.0    1236.0 -1.0          92.0  92.0
5     2017-04-03       332.0    1236.0  0.0          81.0  92.0
6     2017-04-04       333.0    1237.0 -1.0          87.0  87.0
7     2017-04-04       333.0    1237.0  0.0          70.0  87.0

Thank you for your help!


Solution

  • I think you need first filter by boolean indexing or query and then groupby with DataFrameGroupBy.ffill what works nice, if always -1 values are in first row per group:

    df['RES_J-1'] = df.query('J_X == -1')['RES_HOLD_IND']
    #alternative
    #df['RES_J-1'] = df.loc[df['J_X'] == -1, 'RES_HOLD_IND']
    
    df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1'].ffill()
    print (df)
          DPT_DATE  TRANCHE_NO  TRAIN_NO  J_X  RES_HOLD_IND  RES_J-1
    0   2017-04-01       330.0    1234.0 -1.0         100.0    100.0
    1   2017-04-01       330.0    1234.0  0.0          80.0    100.0
    2   2017-04-02       331.0    1235.0 -1.0          91.0     91.0
    3   2017-04-02       331.0    1235.0  0.0          83.0     91.0
    4   2017-04-03       332.0    1236.0 -1.0          92.0     92.0
    5   2017-04-03       332.0    1236.0  0.0          81.0     92.0
    6   2017-04-04       333.0    1237.0 -1.0          87.0     87.0
    7   2017-04-04       333.0    1237.0  0.0          70.0     87.0
    8   2017-04-05       334.0    1238.0 -1.0          93.0     93.0
    9   2017-04-05       334.0    1238.0  0.0          90.0     93.0
    10  2017-04-06       335.0    1239.0 -1.0          89.0     89.0
    11  2017-04-06       335.0    1239.0  0.0          85.0     89.0
    12  2017-04-07       336.0    1240.0 -1.0          82.0     82.0
    13  2017-04-07       336.0    1240.0  0.0          76.0     82.0
    

    If -1 is only one per group but not always first use:

    df['RES_J-1'] = df.groupby(['DPT_DATE','TRAIN_NO'])['RES_J-1']
                      .apply(lambda x: x.ffill().bfill())