Search code examples
rdplyrgroup-bymutate

r - Group by two columns and create a new sequential number for values of one of the columns


I need to use r (dplyr) to group_by two variables (schoolid, grade) to create a third sequential numeric variable, that restarts from 1:n for each grade within each school. In other words, the new_id increments (1:n) for each contiguous block of unique grade values. It should look like the new_id column in the data frame below.

structure(list(schoolid = c(201L, 201L, 201L, 201L, 201L, 201L, 
201L, 201L, 201L, 201L, 201L, 201L, 201L, 202L, 202L, 202L, 202L, 
202L, 202L, 202L, 202L, 202L), grade = c(3L, 3L, 4L, 4L, 5L, 
5L, 3L, 3L, 5L, 5L, 5L, 3L, 3L, 3L, 3L, 4L, 4L, 5L, 5L, 3L, 4L, 
4L), new_id = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 3L, 
3L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L)), row.names = c(NA, 22L
), class = "data.frame")

I want to produce to following dataframe with the new_id column added but right now I just have schoolid and grade

OR

data table I want to produce

I tried to use the following code but the output does not do what I am looking for as shown in the table above


data %>%
  group_by(schoolid, grade) %>%
  mutate(new_id = 1:n())

Solution

  • If I'm understanding right, I think this should work. I use the data.table::rleid function to create a helper variable of contiguous values.

    Next time please share sample data as copy/pasteable text so we don't have to transcribe from an image :)

    sample_data = data.frame(
      schoolid = 201,
      grade = c(3,3,4,4,5,5,3,3,5,5,5,3,3)
    )
    
    library(dplyr)
    sample_data %>%
      group_by(schoolid) %>%
      mutate(
        g = data.table::rleid(grade)
      ) %>%
      group_by(schoolid, grade) %>%
      mutate(
        new_id = dense_rank(g)
      ) %>%
      ungroup()
    # # A tibble: 13 × 4
    #    schoolid grade     g new_id
    #       <dbl> <dbl> <int>  <int>
    #  1      201     3     1      1
    #  2      201     3     1      1
    #  3      201     4     2      1
    #  4      201     4     2      1
    #  5      201     5     3      1
    #  6      201     5     3      1
    #  7      201     3     4      2
    #  8      201     3     4      2
    #  9      201     5     5      2
    # 10      201     5     5      2
    # 11      201     5     5      2
    # 12      201     3     6      3
    # 13      201     3     6      3