Let's consider markers
with their coefficient of variation (cv
) and three reference cv (rcv
):
Initial data:
marker cv rcv1 rcv2 rcv3
<chr> <dbl> <dbl> <dbl> <dbl>
1 AAA 7 10 8 5
2 BBB 4 5 3 1
3 CCC 11 20 15 12
4 DDD 8 7 5 2
I would like to mutate three new variables:
rcv_value
: the closest rcv
value greater than the cv
rcv_name
: the column name of that rcv_value
cv_conclusion
:
cv
is lower than one or the other of the rcvs
cv
is higher than the highest rcv
Desired output:
marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 AAA 7 10 8 5 8 rcv2 ok
2 BBB 4 5 3 1 5 rcv1 ok
3 CCC 11 20 15 12 12 rcv3 ok
4 DDD 8 7 5 2 7 rcv1 ko
NB: my real data has more than 100 markers
and about 10 different rcv
.
Where I fail is getting the rcv_name
from the corresponding rcv_value
(using mutate
and case_when
).
Thanks for help.
Data:
dat0 <-
structure(list(marker = c("AAA", "BBB", "CCC", "DDD"), cv = c(7,
4, 11, 8), rcv1 = c(10, 5, 20, 7), rcv2 = c(8, 3, 15, 5), rcv3 = c(5,
1, 12, 2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))
This is a case where reshaping to long makes it much more straightforward. First use tidyr::pivot_longer()
, then dplyr::mutate(.by = marker)
to perform these operations by group, and then tidyr::pivot_wider()
to return the data to the original shape.
library(dplyr)
library(tidyr)
dat0 |>
pivot_longer(-c(marker, cv)) |>
mutate(
rcv_value = min(value[value > cv]),
# rcv_value will be Inf if no values > cv
cv_conclusion = if_else(is.infinite(rcv_value), "ko", "ok"),
rcv_value = if_else(is.infinite(rcv_value), max(value), rcv_value),
rcv_name = name[rcv_value == value],
.by = marker
) |>
# reshape back to wide
pivot_wider(id_cols = c(marker, cv, rcv_value, rcv_name, cv_conclusion)) |>
# reorder columns as desired
relocate(marker, cv, rcv1:rcv3, rcv_value:cv_conclusion)
# # A tibble: 4 × 8
# marker cv rcv1 rcv2 rcv3 rcv_value rcv_name cv_conclusion
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
# 1 AAA 7 10 8 5 8 rcv2 ok
# 2 BBB 4 5 3 1 5 rcv1 ok
# 3 CCC 11 20 15 12 12 rcv3 ok
# 4 DDD 8 7 5 2 7 rcv1 ko
If you can have reference cv ties (i.e. repeated values in the same row in rcv1:rcv3
columns) you'll have to specify which you want rcv_name
to include. If you just want the first value you could change that line to rcv_name = name[rcv_value == value][1]
. Alternatively, if you are doing further processing you could create a list column to store all the values, e.g. list(name[rcv_value == value])
(though I might prefer to just keep the table in long form). Alternatively, if this table is an output, you might want to paste them together e.g. rcv_name = paste(name[rcv_value == value], collapse = ",")
to create results in the column such as "rcv2,rcv3"
.