How do I create an index variable for unique values of X within a group Y?

I have the following table:

id_question  id_event   num_events
2015012713    49508          1
2015012711    49708          1
2015011523    41808          3
2015011523    44008          3
2015011523    44108          3
2015011522    41508          3
2015011522    43608          3
2015011522    43708          3
2015011521    39708          1
2015011519    44208          1

The third column gives the count of events by question. I want to create a variable that would index the events by question only where there are multiple events per question. It would look something like that:

id_question  id_event   num_events  index_event
2015012713    49508          1          
2015012711    49708          1          
2015011523    41808          3          1
2015011523    44008          3          2
2015011523    44108          3          3
2015011522    41508          3          1
2015011522    43608          3          2
2015011522    43708          3          3
2015011521    39708          1          
2015011519    44208          1

How can I do that?

Solution

We can use tidyverse to create an 'index_event' after grouping by 'id_question'. If the number of rows are greater than 1 (n() >1), then get the sequence of rows (row_number()) and the default option in case_when is NA

library(dplyr)
df1 %>%
   group_by(id_question) %>%
   mutate(index_event = case_when(n() >1 ~ row_number()))
# A tibble: 10 x 4
# Groups:   id_question [6]
#   id_question id_event num_events index_event
#         <int>    <int>      <int>       <int>
# 1  2015012713    49508          1          NA
# 2  2015012711    49708          1          NA
# 3  2015011523    41808          3           1
# 4  2015011523    44008          3           2
# 5  2015011523    44108          3           3
# 6  2015011522    41508          3           1
# 7  2015011522    43608          3           2
# 8  2015011522    43708          3           3
# 9  2015011521    39708          1          NA
#10  2015011519    44208          1          NA

Or with data.table, we use rowid on 'id_question' and change the elements that are 1 in 'num_events' to NA with NA^ (making use of NA^0, NA^1)

library(data.table)
setDT(df1)[, index_event := rowid(id_question) * NA^(num_events == 1)]

Or using base R, another option with the sequence of frequency from 'id_question' and change elements to NA as in the previous case

df1$index_event <-  with(df1, sequence(table(id_question)) * NA^(num_events == 1))
df1$index_event
#[1] NA NA  1  2  3  1  2  3 NA NA

data

df1 <- structure(list(id_question = c(2015012713L, 2015012711L, 2015011523L, 
2015011523L, 2015011523L, 2015011522L, 2015011522L, 2015011522L, 
2015011521L, 2015011519L), id_event = c(49508L, 49708L, 41808L, 
44008L, 44108L, 41508L, 43608L, 43708L, 39708L, 44208L), num_events = c(1L, 
1L, 3L, 3L, 3L, 3L, 3L, 3L, 1L, 1L)), class = "data.frame", row.names = c(NA, 
-10L))