Search code examples
rdataframedplyrmutate

Mutate the column name from the corresponding selected value among many columns


Let's consider markers with their coefficient of variation (cv) and three reference cv (rcv):

Initial data:

  marker    cv  rcv1  rcv2  rcv3
  <chr>  <dbl> <dbl> <dbl> <dbl>
1 AAA        7    10     8     5
2 BBB        4     5     3     1
3 CCC       11    20    15    12
4 DDD        8     7     5     2  

I would like to mutate three new variables:

  • rcv_value: the closest rcv value greater than the cv
  • rcv_name: the column name of that rcv_value
  • cv_conclusion:
    • ok if the cv is lower than one or the other of the rcvs
    • ko if the cv is higher than the highest rcv

Desired output:

  marker    cv  rcv1  rcv2  rcv3 rcv_value rcv_name cv_conclusion
  <chr>  <dbl> <dbl> <dbl> <dbl>     <dbl> <chr>    <chr>        
1 AAA        7    10     8     5         8 rcv2     ok           
2 BBB        4     5     3     1         5 rcv1     ok           
3 CCC       11    20    15    12        12 rcv3     ok           
4 DDD        8     7     5     2         7 rcv1     ko  

NB: my real data has more than 100 markers and about 10 different rcv.
Where I fail is getting the rcv_name from the corresponding rcv_value (using mutate and case_when).

Thanks for help.

Data:

dat0 <-
structure(list(marker = c("AAA", "BBB", "CCC", "DDD"), cv = c(7, 
4, 11, 8), rcv1 = c(10, 5, 20, 7), rcv2 = c(8, 3, 15, 5), rcv3 = c(5, 
1, 12, 2)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -4L))

Solution

  • This is a case where reshaping to long makes it much more straightforward. First use tidyr::pivot_longer(), then dplyr::mutate(.by = marker) to perform these operations by group, and then tidyr::pivot_wider() to return the data to the original shape.

    library(dplyr)
    library(tidyr)
    
    dat0 |>
        pivot_longer(-c(marker, cv)) |>
        mutate(
            rcv_value = min(value[value > cv]),
            # rcv_value will be Inf if no values > cv
            cv_conclusion = if_else(is.infinite(rcv_value), "ko", "ok"),
            rcv_value = if_else(is.infinite(rcv_value), max(value), rcv_value),
            rcv_name = name[rcv_value == value],
            .by = marker
        ) |>
        # reshape back to wide
        pivot_wider(id_cols = c(marker, cv, rcv_value, rcv_name, cv_conclusion)) |>
        # reorder columns as desired
        relocate(marker, cv, rcv1:rcv3, rcv_value:cv_conclusion)
    
    # # A tibble: 4 × 8
    #   marker    cv  rcv1  rcv2  rcv3 rcv_value rcv_name cv_conclusion
    #   <chr>  <dbl> <dbl> <dbl> <dbl>     <dbl> <chr>    <chr>        
    # 1 AAA        7    10     8     5         8 rcv2     ok           
    # 2 BBB        4     5     3     1         5 rcv1     ok           
    # 3 CCC       11    20    15    12        12 rcv3     ok           
    # 4 DDD        8     7     5     2         7 rcv1     ko    
    

    A note on ties

    If you can have reference cv ties (i.e. repeated values in the same row in rcv1:rcv3 columns) you'll have to specify which you want rcv_name to include. If you just want the first value you could change that line to rcv_name = name[rcv_value == value][1]. Alternatively, if you are doing further processing you could create a list column to store all the values, e.g. list(name[rcv_value == value]) (though I might prefer to just keep the table in long form). Alternatively, if this table is an output, you might want to paste them together e.g. rcv_name = paste(name[rcv_value == value], collapse = ",") to create results in the column such as "rcv2,rcv3".