Search code examples
rdplyrcaselagmutated

Creating a new column with conditions in addition to the row value of the new column


Any ideas on how to create a new column B using the values of column A, while using the value of the row above of the new created colum B?

The value of B should be corresponding to:

A0 = value of the row above.
A1 = 1.
A2 = value of the row above + 1.

Current dataframe + desired outcome

                
Dataframe           Desired outcome  
A                   A   B
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
2                   2   4
0                   0   4
2                   2   5
0                   0   5
2                   2   6
0                   0   6
1                   1   1
0                   0   1
1                   1   1
0                   0   1
2                   2   2
0                   0   2
2                   2   3
0                   0   3
1                   1   1
0                   0   1
2                   2   2
0                   0   2


Data Frame
A <- c(1,0,2,0,2,0,2,0,2,0,2,0,1,0,1,0,2,0,2,0,1,0,2,0)
Bdesiredoutcome <- c(1,1,2,2,3,3,4,4,5,5,6,6,1,1,1,1,2,2,3,3,1,1,2,2)
df = data.frame(A,Bdesiredoutcome)

I tried using dpylr, mutate(), case_when() and lag() but keep running into errors. Due to using the lag() function. When using lag(A) the desired outcome cannot be generated. Any idea's on how to solve this problem?

df <- df %>%
          mutate(B = case_when((A == 0) ~ lag(B), 
                               (A == 1) ~ 1,
                               (A == 2) ~ (lag(B)+1)
    ))

Error in UseMethod("mutate_") : 
  no applicable method for 'mutate_' applied to an object of class "function"
In addition: Warning message:

Solution

  • We can create a grouping column with cumsum and then create the 'B' column

    library(dplyr)
    df %>% 
       group_by(grp = cumsum(A == 1)) %>% 
       mutate(B = cumsum(A != 0)) %>% 
       ungroup %>%
       select(-grp) %>%
       as.data.frame
    

    -output

       A Bdesired B
    1  1        1 1
    2  0        1 1
    3  2        2 2
    4  0        2 2
    5  2        3 3
    6  0        3 3
    7  2        4 4
    8  0        4 4
    9  2        5 5
    10 0        5 5
    11 2        6 6
    12 0        6 6
    13 1        1 1
    14 0        1 1
    15 1        1 1
    16 0        1 1
    17 2        2 2
    18 0        2 2
    19 2        3 3
    20 0        3 3
    21 1        1 1
    22 0        1 1
    23 2        2 2
    24 0        2 2