Consider the data frame below:
nameID titleID year dummy
1 a b 1999 1
2 e c 1999 1
3 i d 2000 0
4 o f 2000 0
5 a f 2000 1
6 e g 2001 0
7 i h 2002 0
8 i j 2003 0
9 u k 2003 1
10 o l 2004 1
11 a m 2004 0
12 o m 2004 0
13 u n 2005 0
I need a script that will add a new column, "dummycount", and assign either value 0 or 1, depending on the following conditions:
Here is an example of the desired output, that I put together:
nameID titleID year dummy dummycount
1 a b 1999 1 0
2 e c 1999 1 0
3 i d 2000 0 0
4 o f 2000 0 0
5 a f 2000 1 1
6 e g 2001 0 1
7 i h 2002 0 0
8 i j 2003 0 0
9 u k 2003 1 0
10 o l 2004 1 0
11 a m 2004 0 1
12 o m 2004 0 1
13 u n 2005 0 1
As you see, "dummycounts" only takes the value 1, if the "nameID" has at least one previous, single occurrence of 1 in the "dummy" column.
Thank you for your help!
We need lag
library(dplyr)
df1 %>%
group_by(nameID) %>%
mutate(dummycount = cummax(lag(dummy, default = 0)))
# A tibble: 13 x 5
# Groups: nameID [5]
# nameID titleID year dummy dummycount
# <chr> <chr> <int> <int> <int>
# 1 a b 1999 1 0
# 2 e c 1999 1 0
# 3 i d 2000 0 0
# 4 o f 2000 0 0
# 5 a f 2000 1 1
# 6 e g 2001 0 1
# 7 i h 2002 0 0
# 8 i j 2003 0 0
# 9 u k 2003 1 0
#10 o l 2004 1 0
#11 a m 2004 0 1
#12 o m 2004 0 1
#13 u n 2005 0 1