Search code examples
pythonpandasmergepandas-groupbyshift

Creating a column by addition of two adjacent rows with a condition


Create column E that fills column C. If D is <10, then it fill C of earlier row and current row.

This is my Input DataSet:

I,A,B,C,D
1,P,100+,L,15
2,P,100+,M,9
3,P,100+,N,15
4,P,100+,O,15
5,Q,100+,L,2
6,Q,100+,M,15
7,Q,100+,N,3
8,Q,100+,O,15

I tried using some for loops. However, i think we can use shift or append functions to complete this. However, i am getting value errors using the shift function.

Desired Output:

I,A,B,C,D,E
1,P,100+,L,15,L
2,P,100+,M,9,M+N
3,P,100+,N,15,M+N
4,P,100+,O,15,O
5,Q,100+,L,2,L+O
6,Q,100+,M,15,M+N
7,Q,100+,N,3,M+N
8,Q,100+,O,15,L+O

I am working out the column E given in desired output table above.


Solution

  • using np.where and pd.shift

    ##will populate C values index+1 where the condition is True 
    df['E'] = np.where( df['D'] < 10,df.loc[df.index + 1,'C'] , df['C'])
    ##Appending the values of C and E
    df['E'] = df.apply(lambda x: x.C + '+' + x.E if x.C != x.E else x.C, axis=1)
    df['F'] = df['E'].shift(1)
    ##Copying the values at index+1 position where the condition is True
    df['E'] = df.apply(lambda x: x.F if '+' in str(x.F) else x.E, axis=1)
    
    df.drop('F', axis=1, inplace=True)
    

    Output

       I  A     B  C   D    E
    0  1  P  100+  L  15    L
    1  2  P  100+  M   9  M+N
    2  3  P  100+  N  15  M+N
    3  4  P  100+  O  15    O
    4  5  Q  100+  L   2  L+M
    5  6  Q  100+  M  15  L+M
    6  7  Q  100+  N   3  N+O
    7  8  Q  100+  O  15  N+O