Search code examples
rfilterflags

flag missing categories within a group in an R dataframe


I would like to flag itemid when the score variable is missing any score between 0 and max.score of that item. Here is an example dataset.

df <- data.frame(id = c(1,2,3, 1,2,3, 1,2,3, 1,2,3, 1,2,3),
                 itemid = c(11,11,11, 12,12,12, 13,13,13, 14,14,14, 15,15,15),
                 score = c(0,1,2, 1,2,2, 0,1,1, 2,0,0, 1,1,1),
                 max.points = c(2,2,2, 2,2,2, 1,1,1, 2,2,2, 1,1,1 ))

> df
   id itemid score max.points
1   1     11     0          2
2   2     11     1          2
3   3     11     2          2
4   1     12     1          2
5   2     12     2          2
6   3     12     2          2
7   1     13     0          1
8   2     13     1          1
9   3     13     1          1
10  1     14     2          2
11  2     14     0          2
12  3     14     0          2
13  1     15     1          1
14  2     15     1          1
15  3     15     1          1

In this case, itemid=12 is missing score of 0, itemid=14 is missing score of 1, and itemid=15 is missing score of 0 .

Any ideas on how to flag items 12,14 and 15?

Thanks!


Solution

  • Grouped by 'itemid', create the flag by checking if all of the sequence of max.points (from 0) are %in% 'score' and negate (!)

    library(dplyr)
    df %>%
      group_by(itemid) %>%
      mutate(flag = !all(c(0, seq_len(first(max.points))) %in% score)) %>%
      ungroup
    

    -output

    # A tibble: 15 × 5
          id itemid score max.points flag 
       <dbl>  <dbl> <dbl>      <dbl> <lgl>
     1     1     11     0          2 FALSE
     2     2     11     1          2 FALSE
     3     3     11     2          2 FALSE
     4     1     12     1          2 TRUE 
     5     2     12     2          2 TRUE 
     6     3     12     2          2 TRUE 
     7     1     13     0          1 FALSE
     8     2     13     1          1 FALSE
     9     3     13     1          1 FALSE
    10     1     14     2          2 TRUE 
    11     2     14     0          2 TRUE 
    12     3     14     0          2 TRUE 
    13     1     15     1          1 TRUE 
    14     2     15     1          1 TRUE 
    15     3     15     1          1 TRUE