Detect first occurrences between two variables in R

I would like to count the first occurrences of two variables (IPC and 2IPC) in R, leaving out cases in which the two variables are the same (e.g. !IPC == 2IPC).

Here is an example of dataset:

**date  IPC     2IPC    occurrence** 
 1968   G01S    Na      1
 1969   G01N    G01S    1
 1969   B62D    B43L    1
 1969   G01S    Na      0
 1970   G01S    G01C    1
 1970   G01S    H04B    1
 1970   G01S    H04B    0
 1971   G01S    H01S    1
 1971   G01S    G01S    0
 1972   H04N    H04N    0
 1972   G01S    G01S    0
 1972   G01S    G01S    0

I used the Excel function COUNTIFS which create a dummy (occurrence) for the first occurrences between two variables. Is it possible to use dplyr for this task?

Solution

Using dplyr and assuming that Na values are valid values and not NAs, you may run the following code:

library(dplyr)
mydf %>% 
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>% 
mutate(FirstOccurrence=case_when(
    (IPC!=X2IPC) & N_occurences==1 ~ 1,
    (IPC==X2IPC) | N_occurences!=1 ~ 0
))

You'll get the following result:

   X..date IPC   X2IPC occurrence.. N_occurences FirstOccurrence
     <int> <chr> <chr>        <int>        <int>           <dbl>
 1    1968 G01S  Na               1            1            1.00
 2    1969 G01N  G01S             1            1            1.00
 3    1969 B62D  B43L             1            1            1.00
 4    1969 G01S  Na               0            2            0   
 5    1970 G01S  G01C             1            1            1.00
 6    1970 G01S  H04B             1            1            1.00
 7    1970 G01S  H04B             0            2            0   
 8    1971 G01S  H01S             1            1            1.00
 9    1971 G01S  G01S             0            1            0   
10    1972 H04N  H04N             0            1            0   
11    1972 G01S  G01S             0            2            0   
12    1972 G01S  G01S             0            3            0

Whether you want the same data frame in you OP, just run the code:

mydf %>% 
    group_by(IPC,X2IPC) %>%
    mutate(N_occurences=row_number()) %>% 
    mutate(FirstOccurrence=case_when(
        (IPC!=X2IPC) & N_occurences==1 ~ 1,
        (IPC==X2IPC) | N_occurences!=1 ~ 0
    )) %>%
    select(1:3,6)