Search code examples
pythonpandasdataframeconditional-statementsmultiple-columns

create new column with values from another column based on condition


I have a dataframe

A   B   Value   FY
1   5    a     2020
2   6    b     2020
3   7    c     2021
4   8    d     2021

I want to create a column 'prev_FY' which looks at the 'value' column and previous year and populates in current year row in FY column; my desired output is:

A   B   Value   FY     prev_FY
1   5    a     2020    
2   6    b     2020
3   7    c     2021      a
4   8    d     2021      b

I tried using pivottable but it does not work as the values remain the same as corresponding to the FY. SHIFT function is not feasible as I have millions of rows.


Solution

  • Use:

    df['g'] = df.groupby('FY').cumcount()
    df2 = df[['FY','Value','g']].assign(FY = df['FY'].add(1))
    
    df = df.merge(df2, on=['FY','g'], how='left', suffixes=('','_prev')).drop('g', axis=1)
    print (df)
       A  B Value    FY Value_prev
    0  1  5     a  2020        NaN
    1  2  6     b  2020        NaN
    2  3  7     c  2021          a
    3  4  8     d  2021          b