Search code examples
pythondataframegroup-bypipeapply

Why don't pipe() or apply() work properly on this pandas groupby?


I have DataFrame with columns col1, col2, col3. I want to create another col4 which contains col2[n+3]/col2-1 for each group in col1 separately:

|col1 | col2 | col3|
|-----|------|-----|    
|  A  |  2   |  4  |
|  A  |  4   |  5  | 
|  A  |  7   |  7  | 
|  A  |  3   |  8  | 
|  A  |  7   |  3  | 
|  B  |  8   |  9  |
|  B  |  10  |  10 | 
|  B  |  8   |  9  |
|  B  |  20  |  15 |

The output should be:

|col1 | col2 | col3| col4|
|-----|------|-----|-----|    
|  A  |    2 |   4 |  0.5|  # (3/2-1)
|  A  |    4 |   5 | 0.75|  # (7/4-1)
|  A  |    7 |   7 |  NA |
|  A  |    3 |   8 |  NA |
|  A  |    7 |   3 |  NA |
|  B  |    8 |   9 | 1.5 |
|  B  |   10 |  10 |  NA |
|  B  |    8 |  9  |  NA |
|  B  |   20 |  15 |  NA |

My code is:

df['col4']= df.groupby('col1').apply(lambda x: x['col2'].shift(-3)/x['col2']-1)

which results in col4 with all entries 'NA'.

I also tried:

df['col4']= df.groupby('col1').pipe(lambda x: x['col2'].shift(-3)/x['col2']-1)

which ignores groups 'A' and 'B' and results in:

|col1 | col2 | col3| col4  |
|-----|------|-----|-------|      
|  A  |  2   |  4  | 0.5   |
|  A  |  4   |  5  | 0.75  |
|  A  |  7   |  7  | 0.1428|
|  A  |  3   |  8  | 2.33  |
|  A  |  7   |  3  | 0.1428|
|  B  |  8   |  9  | 1.5   |
|  B  | 10   | 10  | NA    |
|  B  |  8   |  9  | NA    |
|  B  | 20   | 15  | NA    |

Does anyone know how to do this task or fix my code?


Solution

  • IIUC:

    df['col4'] = df.groupby('col1')['col2'].transform(lambda x: x.shift(-3)) / df['col2'] - 1
    

    Output:

      col1  col2  col3  col4
    0    A     2     4  0.50
    1    A     4     5  0.75
    2    A     7     7   NaN
    3    A     3     8   NaN
    4    A     7     3   NaN
    5    B     8     9  1.50
    6    B    10    10   NaN
    7    B     8     9   NaN
    8    B    20    15   NaN
    

    Use transform to shift your 'col2' within each group then divide by 'col2' and subtract 1.