Search code examples
rdplyr

How to check if all student in the same class all have the same teacher


I don't know how to write this in general terms, so I'll be specific. In the TIMSS datasets, students are nested in classes. They are also connected to specific teachers, ranging from one up to four teachers. I want to check if all students within the same class all have the same teachers, or whether there are student with different teachers within the same class.

I want it to fit in a dplyr pipe chain.

Example df:

df <- data.frame(
    IDSTUD = c(50040201, 50040201, 50040201, 
    50040201, 50040204, 50040204, 50040204, 50040204, 50120303, 50120303, 
    50120303, 50120303, 50120304, 50120304, 50120304), 
    IDCLASS = c(500402, 500402, 
    500402, 500402, 500402, 500402, 500402, 500402, 501203, 501203, 
    501203, 501203, 501203, 501203, 501203), 
    IDTEACH = c(500402, 500403, 
    500404, 500405, 500405, 500404, 500403, 500402, 501201, 501202, 
    501205, 501203, 501203, 501205, 501202)
)


unique_teachers <- df %>% some_code(...)

Desired output is where students in a class have a teacher others in the same class don't, e.g.:

> unique_teachers

IDSTUD      IDCLASS IDTEACH
50120303    501203  501201

Solution

  • In the answer below, I first grouped by class to identify how many unique students there are. Then, I identified the class-teacher pairs that did not contain the total number of students in the class.

    library(dplyr)
    df <- data.frame(
      IDSTUD = c(50040201, 50040201, 50040201, 
                 50040201, 50040204, 50040204, 50040204, 50040204, 50120303, 50120303, 
                 50120303, 50120303, 50120304, 50120304, 50120304), 
      IDCLASS = c(500402, 500402, 
                  500402, 500402, 500402, 500402, 500402, 500402, 501203, 501203, 
                  501203, 501203, 501203, 501203, 501203), 
      IDTEACH = c(500402, 500403, 
                  500404, 500405, 500405, 500404, 500403, 500402, 501201, 501202, 
                  501205, 501203, 501203, 501205, 501202)
    )
    df %>% group_by(IDCLASS) %>% 
      mutate(n_students = length(unique(IDSTUD))) %>% 
      group_by(IDCLASS, IDTEACH) %>% 
      filter(n_students != n())
    #> # A tibble: 1 × 4
    #> # Groups:   IDCLASS, IDTEACH [1]
    #>     IDSTUD IDCLASS IDTEACH n_students
    #>      <dbl>   <dbl>   <dbl>      <int>
    #> 1 50120303  501203  501201          2
    

    Created on 2024-05-27 with reprex v2.0.2