Search code examples
rdplyracross

How to replace values in multiple columns based on value from left-adjacent column


I have data similar to this (though way bigger dataset):

  correct.trial1 RT.trial1 correct.trial2 RT.trial2 correct.trial3 RT.trial3
1              1       473              0       337              1       426
2              1       496              1       407              1       421
3              1       368              0       405              1       470
4              0       333              1       475              0       473
5              0       435              0       312              1       402

We can make this sample with this:

set.seed(12)
df <- data.frame(correct.trial1 = sample(0:1, 5, replace=T),
                 RT.trial1 = sample(300:500, 5, replace=T),
                 correct.trial2 = sample(0:1, 5, replace=T),
                 RT.trial2 = sample(300:500, 5, replace=T),
                 correct.trial3 = sample(0:1, 5, replace=T),
                 RT.trial3 = sample(300:500, 5, replace=T))

I would like to replace values starts_with("RT.trial") with NA when the adjacent (left) column starts_with("correct.trial") has value 0. Of course, I could do it one at a time, e.g.,:

library(dplyr)
df %>%
  mutate(RT.trial1 = ifelse(correct.trial1==1, RT.trial1, NA),
         RT.trial2 = ifelse(correct.trial2==1, RT.trial2, NA),
         RT.trial3 = ifelse(correct.trial3==1, RT.trial3, NA))

So it would look like this:

  correct.trial1 RT.trial1 correct.trial2 RT.trial2 correct.trial3 RT.trial3
1              1       473              0        NA              1       426
2              1       496              1       407              1       421
3              1       368              0        NA              1       470
4              0        NA              1       475              0        NA
5              0        NA              0        NA              1       402

But that would be impracticable with thousands of columns.

Question

How can I do this for all columns simultaneously? (Note: I would prefer a dplyr solution, and using across would be preferable to using mutate_at.)

Attempt

Not sure, but based on this related post, it would (maybe) look something like this:

df %>%
  mutate_at(vars(starts_with("RT.trial")),
  ~ifelse(vars(starts_with("correct.trial"))==0, NA, .x))

Solution

  • We could reshape to 'long' format and then do the transformation

    library(dplyr)
    library(tidyr)
    df %>% 
        mutate(rn = row_number()) %>% 
        pivot_longer(cols = -rn, names_to = c(".value", "grp"), 
              names_sep="\\.") %>%
        mutate(RT = case_when(as.logical(correct) ~ RT)) %>% 
        pivot_wider(names_from = grp, values_from = c(correct, RT), 
              names_sep = ".") %>%
        select(names(df))
    

    -output

    # A tibble: 5 x 6
    #  correct.trial1 RT.trial1 correct.trial2 RT.trial2 correct.trial3 RT.trial3
    #           <int>     <int>          <int>     <int>          <int>     <int>
    #1              0        NA              0        NA              0        NA
    #2              1       394              1       458              0        NA
    #3              0        NA              1       337              0        NA
    #4              1       479              0        NA              0        NA
    #5              0        NA              0        NA              0        NA
    

    In base R, this can be done in a more easier way

    i1 <- grepl('correct', names(df))
    df[!i1] <- (NA^!df[i1]) * df[!i1]
    

    data

    df <- structure(list(correct.trial1 = c(0L, 1L, 0L, 1L, 0L), RT.trial1 = c(417L, 
    394L, 345L, 479L, 368L), correct.trial2 = c(0L, 1L, 1L, 0L, 0L
    ), RT.trial2 = c(382L, 458L, 337L, 406L, 306L), correct.trial3 = c(0L, 
    0L, 0L, 0L, 0L), RT.trial3 = c(469L, 364L, 361L, 359L, 309L)),
     class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5"))