I would like to count the first occurrences of two variables (IPC and 2IPC) in R, leaving out cases in which the two variables are the same (e.g. !IPC == 2IPC).
Here is an example of dataset:
**date IPC 2IPC occurrence**
1968 G01S Na 1
1969 G01N G01S 1
1969 B62D B43L 1
1969 G01S Na 0
1970 G01S G01C 1
1970 G01S H04B 1
1970 G01S H04B 0
1971 G01S H01S 1
1971 G01S G01S 0
1972 H04N H04N 0
1972 G01S G01S 0
1972 G01S G01S 0
I used the Excel function COUNTIFS which create a dummy (occurrence) for the first occurrences between two variables. Is it possible to use dplyr for this task?
Using dplyr
and assuming that Na
values are valid values and not NAs, you may run the following code:
library(dplyr)
mydf %>%
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>%
mutate(FirstOccurrence=case_when(
(IPC!=X2IPC) & N_occurences==1 ~ 1,
(IPC==X2IPC) | N_occurences!=1 ~ 0
))
You'll get the following result:
X..date IPC X2IPC occurrence.. N_occurences FirstOccurrence
<int> <chr> <chr> <int> <int> <dbl>
1 1968 G01S Na 1 1 1.00
2 1969 G01N G01S 1 1 1.00
3 1969 B62D B43L 1 1 1.00
4 1969 G01S Na 0 2 0
5 1970 G01S G01C 1 1 1.00
6 1970 G01S H04B 1 1 1.00
7 1970 G01S H04B 0 2 0
8 1971 G01S H01S 1 1 1.00
9 1971 G01S G01S 0 1 0
10 1972 H04N H04N 0 1 0
11 1972 G01S G01S 0 2 0
12 1972 G01S G01S 0 3 0
Whether you want the same data frame in you OP, just run the code:
mydf %>%
group_by(IPC,X2IPC) %>%
mutate(N_occurences=row_number()) %>%
mutate(FirstOccurrence=case_when(
(IPC!=X2IPC) & N_occurences==1 ~ 1,
(IPC==X2IPC) | N_occurences!=1 ~ 0
)) %>%
select(1:3,6)