Search code examples
rtidyversepairwise

Replacing value depending on paired column


I have a dataframe with two columns per sample (n > 1000 samples):

df <- data.frame(
    "sample1.a" = 1:5, "sample1.b" = 2,
    "sample2.a" = 2:6, "sample2.b" = c(1, 3, 3, 3, 3),
    "sample3.a" = 3:7, "sample3.b" = 2)

If there is a zero in column .b, the correspsonding value in column .a should be set to NA.

I thought to write a function over colnames (without suffix) to filter each pair of columns and conditional exchaning values. Is there a simpler approach based on tidyverse?


Solution

  • We can split the data.frame into a list of data.frames and do the replacement in base R

    df1 <- do.call(cbind, lapply(split.default(df,  
       sub("\\..*", "", names(df))), function(x) {
                 x[,1][x[2] == 0] <- NA
          x}))
    

    Or another option is Map

    acols <- endsWith(names(df), "a")
    bcols <- endsWith(names(df), "b")
    df[acols] <- Map(function(x, y) replace(x, y == 0, NA), df[acols], df[bcols])
    

    Or if the columns are alternate with 'a', 'b' columns, use a logical index for recycling, create the logical matrix with 'b' columns and assign the corresponding values in 'a' columns to NA

    df[c(TRUE, FALSE)][df[c(FALSE, TRUE)] == 0] <- NA
    

    or an option with tidyverse by reshaping into 'long' format (pivot_longer), changing the 'a' column to NA if there is a correspoinding 0 in 'a', and reshape back to 'wide' format with pivot_wider

    library(dplyr)
    library(tidyr)
    df %>%
        mutate(rn = row_number()) %>%
       pivot_longer(cols = -rn, names_sep="\\.",
            names_to = c('group', '.value')) %>% 
       mutate(a = na_if(b, a == 0)) %>%
       pivot_wider(names_from = group, values_from = c(a, b)) %>% 
       select(-rn)
    # A tibble: 5 x 6
    #  a_sample1 a_sample2 a_sample3 b_sample1 b_sample2 b_sample3
    #      <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
    #1         2         1         2         2         1         2
    #2         2         3         2         2         3         2
    #3         2         3         2         2         3         2
    #4         2         3         2         2         3         2
    #5         2         3         2         2         3         2