Search code examples
pythonpandasdataframemax

Finding the maximum value between two columns where one of them is shifted


My DataFrame is:

import pandas as pd
df = pd.DataFrame(
    {
       'a': [20, 9, 31, 40],
       'b': [1, 10, 17, 30],
    }
)

Expected output: Creating column c

    a   b   c
0  20   1  20
1   9  10  20
2  31  17  17
3  40  30  31

Steps:

c is the maximum value between df.b and df.a.shift(1).bfill().

enter image description here

My attempt:

df['temp'] = df.a.shift(1).bfill()
df['c'] = df[['temp', 'b']].max(axis=1)

Is it the cleanest way / best approach?


Solution

  • If you don't want the temporary column, then you can replace values on the shifted column using where() in a one-liner.

    df['c'] = df['a'].shift(1).bfill().where(lambda x: x>df['b'], df['b'])
    

    This is similar to the combine() method posted in the other answer, but this one does a vectorized comparison while, combine() does it element-wise so this should be much faster as the length of the dataframe increase.