Search code examples
rdplyrprefixcoalesceacross

Coalesce inside an across statement using column name and a prefix


Again I feel like I should be able to do something to seems not quite complicated but I can't figure it out.

I have a dataframe df with variables a1, v_a1, a2, v_a2 and so on. All I want to do is mutate my dataframe so that a1 = coalesce(a1, v_a1)and a2 = coalesce(a2, v_a2)

I tried many different thing but keep it doesn't quite work yet.

Here is a toy example and things I tried :

df <- data.frame(
  a1 = c(1,NA,2,NA),
  v_a1 = c(NA,0,NA,5),
  a2 = c(4,NA,8,NA),
  v_a2 = c(NA,7,NA,NA),
  a11 = c("just here","to force","the use of paste0","inside the coalesce"))
df %>% mutate(
  across(c(a1,a2),
         ~ coalesce(!!!select(matches(paste0(cur_column(),"$"))))))

This throws an error about cur_column() which "Must only be used inside across()". I had the same error when first declaring a function with a df and variable parameters to wrap the coalesce but the it seemed not to appreciate the .data[[cur_column()]] inside the across

df %>% mutate(
  across(c(a1,a2),
         ~ coalesce(across(all_of(c(cur_column(), paste0("v_", cur_column())))))))

This one doesn't have an error but the output is really weird turning the tibble into string and making some pasting. It puzzles me there.

This topic is veeeery close How to coalesce chunks of columns based on a prefix pattern but the coalesce uses a raw string inside the contains which I don't have (hence the need for cur_column and moreover it uses a for loop while I think I should be able to do it within an across...but maybe not !

Any help or piece of advice much appreciated !


Solution

  • Please try the below code

    library(tidyverse)
    
    df2 <- purrr::map2_dfc(df[,str_detect(names(df),'\\ba1\\b|\\ba2\\b')], df[,str_detect(names(df),'v_')], ~ coalesce(.x,.y) )
    
    df3 <- cbind(df2,df[,str_detect(names(df),'v_|a11')])
    
    
      a1 a2 v_a1 v_a2                 a11
    1  1  4   NA   NA           just here
    2  0  7    0    7            to force
    3  2  8   NA   NA   the use of paste0
    4  5 NA    5   NA inside the coalesce
    

    with across please check the code below

    df %>% mutate(across(c(a1,a2), ~ coalesce(.x, get(paste('v',cur_column(),sep = '_'))) ))
    
    
      a1 v_a1 a2 v_a2                 a11
    1  1   NA  4   NA           just here
    2  0    0  7    7            to force
    3  2   NA  8   NA   the use of paste0
    4  5    5 NA   NA inside the coalesce