Search code examples
rdataframedummy-variable

Assigning dummy values based on previous occurrences in R


Consider the data frame below:

   nameID titleID year dummy
1       a       b 1999     1
2       e       c 1999     1
3       i       d 2000     0
4       o       f 2000     0
5       a       f 2000     1
6       e       g 2001     0
7       i       h 2002     0
8       i       j 2003     0
9       u       k 2003     1
10      o       l 2004     1
11      a       m 2004     0
12      o       m 2004     0
13      u       n 2005     0

I need a script that will add a new column, "dummycount", and assign either value 0 or 1, depending on the following conditions:

  • 0 = For a given "nameID", no previous occurrence of 1 in the column "dummy"
  • 1 = For a given "nameID", at least a single occurrence of 1 in the column "dummy".

Here is an example of the desired output, that I put together:

   nameID titleID year dummy dummycount
1       a       b 1999     1          0
2       e       c 1999     1          0
3       i       d 2000     0          0
4       o       f 2000     0          0
5       a       f 2000     1          1
6       e       g 2001     0          1
7       i       h 2002     0          0
8       i       j 2003     0          0
9       u       k 2003     1          0
10      o       l 2004     1          0
11      a       m 2004     0          1
12      o       m 2004     0          1
13      u       n 2005     0          1

As you see, "dummycounts" only takes the value 1, if the "nameID" has at least one previous, single occurrence of 1 in the "dummy" column.

Thank you for your help!


Solution

  • We need lag

    library(dplyr)
    df1 %>% 
      group_by(nameID) %>%
      mutate(dummycount =  cummax(lag(dummy, default = 0)))
    # A tibble: 13 x 5
    # Groups:   nameID [5]
    #   nameID titleID  year dummy dummycount
    #   <chr>  <chr>   <int> <int>      <int>
    # 1 a      b        1999     1          0
    # 2 e      c        1999     1          0
    # 3 i      d        2000     0          0
    # 4 o      f        2000     0          0
    # 5 a      f        2000     1          1
    # 6 e      g        2001     0          1
    # 7 i      h        2002     0          0
    # 8 i      j        2003     0          0
    # 9 u      k        2003     1          0
    #10 o      l        2004     1          0
    #11 a      m        2004     0          1
    #12 o      m        2004     0          1
    #13 u      n        2005     0          1