I have a data frame with multiple observations per ID. Each ID has performed many tests and for each test, has been classified into a tertile rank (top, mid, bottom) based on their test performance. This rank can vary at each time point.
Dummy df looks like this:
df <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
time = c(1,2,3,1,2,3,1,2,3,1,2,3),
test1_rank <- c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank <- c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"))
ID time `test1_rank <- ...` `test2_rank <- ...`
<dbl> <dbl> <chr> <chr>
1 1 1 top bottom
2 1 2 top bottom
3 1 3 top bottom
4 2 1 top top
5 2 2 mid mid
6 2 3 bottom bottom
7 3 1 bottom top
8 3 2 bottom top
9 3 3 bottom top
10 4 1 top top
11 4 2 bottom bottom
12 4 3 bottom bottom
I want to create a new variable where I classify each individual based on how their rank changes with time for each test. Specifically, for each test:
Desired data frame:
df2 <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
time = c(1,2,3,1,2,3,1,2,3,1,2,3),
test1_rank <- c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank <- c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"),
test1_rankgroup <- c("stable(top)", "stable(top)", "stable(top)", "gradual", "gradual", "gradual", "stable(bottom)", "stable(bottom)", "stable(bottom)", "rapid", "rapid", "rapid"),
test2_rankgroup <- c("stable(bottom)", "stable(bottom)", "stable(bottom)", "gradual", "gradual", "gradual", "stable(top)", "stable(top)", "stable(top)", "rapid", "rapid", "rapid"))
ID time `test1_rank <- ...` `test2_rank <- ...` `test1_rankgroup <- ...` test2_ra…¹
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 top bottom stable(top) stable(bo…
2 1 2 top bottom stable(top) stable(bo…
3 1 3 top bottom stable(top) stable(bo…
4 2 1 top top gradual gradual
5 2 2 mid mid gradual gradual
6 2 3 bottom bottom gradual gradual
7 3 1 bottom top stable(bottom) stable(to…
8 3 2 bottom top stable(bottom) stable(to…
9 3 3 bottom top stable(bottom) stable(to…
10 4 1 top top rapid rapid
11 4 2 bottom bottom rapid rapid
12 4 3 bottom bottom rapid rapid
What is the easiest way to solve this in dplyr
using mutate
and case_when
?
You can use mutate
with across
to apply your case_when
to both of your ranked columns.
Your case_when
can use n_distinct
to see if the same value is held across your 3 time points. You can also include specific logic for 1st, 2nd, and 3rd values for a labelled group value.
The .default argument can be used for "other". The .by argument will perform the mutate
grouped by ID
.
library(tidyverse)
df <- tibble(ID = c(1,1,1,2,2,2,3,3,3,4,4,4),
time = c(1,2,3,1,2,3,1,2,3,1,2,3),
test1_rank = c("top", "top", "top", "top", "mid", "bottom", "bottom", "bottom", "bottom", "top", "bottom", "bottom"),
test2_rank = c("bottom", "bottom", "bottom", "top", "mid", "bottom", "top", "top", "top", "top", "bottom", "bottom"))
df %>%
mutate(across(-time,
~ case_when(n_distinct(.) == 1 ~ paste("stable", .),
.[1] == "top" & .[2] == "mid" & .[3] == "bottom" ~ "gradual",
.[1] == "top" & .[2] == "bottom" & .[3] == "bottom" ~ "rapid",
.default = "other"),
.names = "{.col}_group"),
.by = ID)
Output
ID time test1_rank test2_rank test1_rank_group test2_rank_group
<dbl> <dbl> <chr> <chr> <chr> <chr>
1 1 1 top bottom stable top stable bottom
2 1 2 top bottom stable top stable bottom
3 1 3 top bottom stable top stable bottom
4 2 1 top top gradual gradual
5 2 2 mid mid gradual gradual
6 2 3 bottom bottom gradual gradual
7 3 1 bottom top stable bottom stable top
8 3 2 bottom top stable bottom stable top
9 3 3 bottom top stable bottom stable top
10 4 1 top top rapid rapid
11 4 2 bottom bottom rapid rapid
12 4 3 bottom bottom rapid rapid