Search code examples
rdplyr

How to identify whether rows within a group have values in the same columns?


I have a dataframe where each row represents the ratings a project group received from one of three raters (advisor, mentor, other). The columns titled ELA1, ELA2, MATH1, MATH2, ESS1, ESS2 are the items groups can be rated on. However, each group only needs to be rated on three of these items, which means each group can be rated on a different set items. But all three raters should rate the same items within a group. I want to check whether all three raters rated the same set of items within a group. It would be ideal to create a dummy variable that indicates whether the group received ratings on the same set of items from each rater.

Below is an example of my data frame:

df <- data.frame(group=c("A","A","A","B","B","B","C","C","C"),
                 rater=c("Advisor", "Mentor", "Other", "Advisor", "Mentor", "Other", "Advisor", "Mentor", "Other"),
                 ELA1=c(1, 2, 2, NA, NA, 1, NA, NA, NA),
                 ELA2=c(NA, NA, NA, 2, NA, 1, NA, NA, NA),
                 MATH1=c(3, 3, 2, NA, 2, NA, 3, 3, 2),
                 MATH2=c(2, 3, 2, NA, NA, 1, 3, 3, 1),
                 ESS1=c(NA, NA, NA, 2, 2, NA, 3, 3, 1),
                 ESS2=c(NA, NA, NA, 2, 2, NA, NA, NA, NA))

In the example data frame above, there are two groups (group A and C) that are scored on the same items by all three judges. But the ratings provided by each rater to group B were not for the same set of items. I need help figuring out how to write code that will identify instances when each rater did not rate the same set of skills within a group and ideally create an indicator column that indicates whether a group's raters rated the same set of items or not.

I have no code to add below because I don't have a clue about how to approach this at all. Is there anyway to get r to do this?


Solution

  • Here's another answer that is maybe less elegant but possibly simpler to digest. It makes use of the colSums function.

    df <- data.frame(group=c("A","A","A","B","B","B","C","C","C"),
                     rater=c("Advisor", "Mentor", "Other", "Advisor", "Mentor", "Other", "Advisor", "Mentor", "Other"),
                     ELA1=c(1, 2, 2, NA, NA, 1, NA, NA, NA),
                     ELA2=c(NA, NA, NA, 2, NA, 1, NA, NA, NA),
                     MATH1=c(3, 3, 2, NA, 2, NA, 3, 3, 2),
                     MATH2=c(2, 3, 2, NA, NA, 1, 3, 3, 1),
                     ESS1=c(NA, NA, NA, 2, 2, NA, 3, 3, 1),
                     ESS2=c(NA, NA, NA, 2, 2, NA, NA, NA, NA))
    
    groups <- df$"group"
    raters <- df$"rater"
    
    same_rating_set <- c()
    
    for (group in groups) {
        group_ratings <- list()
        for (rater in raters) {
            # Dataframe that just has the entries for a single group/rater combo
            rater_df <- df[df$"group" == group & df$"rater" == rater, ]
            # Ignore the "group" and "rater" columns in checks
            rater_df <- rater_df[, !colnames(rater_df) %in% c("group", "rater")]
            # Running `colSums` on a single-row data.frame essentially flags which
            # columns have a missing value
            missing_columns <- colSums(is.na(rater_df))
            # Just keep the column names for non-missing columns
            rated_items <- colnames(rater_df[, !missing_columns])
            # Add the current rater's rated items to our main list
            group_ratings[[length(group_ratings) + 1]] <- rated_items
        }
        # Group ratings now has 3 values, each corresponding to the set of items
        # that each rater gave the group. If across those 3 sets there is only 1
        # unique set, it means the three sets were identical (i.e. the raters all
        # rated the same items).
        same_rating_set <- c(same_rating_set, length(unique(group_ratings)) == 1)
    }
    
    df$"same_rating_set" <- same_rating_set