I want to use dplyr::coalesce
to find the first non-missing value between pairs of variables in a dataframe containing multiple pairs of variable. The goal is to create a new dataframe with now only one copy for each pair of variable (a coalesce variable without NA values).
Here is an example:
df <- data.frame(
A_1=c(NA, NA, 3, 4, 5),
A_2=c(1, 2, NA, NA, NA),
B_1=c(NA, NA, 13, 14, 15),
B_2=c(11, 12, NA, NA, NA))
Expected output:
A B
1 11
2 12
3 13
4 14
5 15
I am guessing a mix of dplyr::coalesce
with maybe dplyr::mutate_at
based on regular expression could be use but I am not sure how to do it. Is there a way to complete this task with the tidyverse grammar?
Thanks!
EDIT: thanks everyone for your answers! However, I should have included the naming convention for my variables to facilitate the transfer of your answers to my actual problem.. I am sorry about that. My variables are geochemistry variables named in two parts (name of chemical element plus name of core).
Example: Al_TAC4.25.275
where Al
is the element and TAC4.25.275
is the core. I want to coalesce the data from 3 different cores (second part of name) for each element (first part of name). I have 25 pairs of element to coalesce.
You could use transmute, e.g.
library(dplyr)
df <- data.frame(
A_1 = c(NA, NA, 3, 4, 5),
A_2 = c(1, 2, NA, NA, NA),
B_1 = c(NA, NA, 13, 14, 15),
B_2 = c(11, 12, NA, NA, NA)
)
df %>%
transmute(A = coalesce(A_1, A_2),
B = coalesce(B_1, B_2))
#> A B
#> 1 1 11
#> 2 2 12
#> 3 3 13
#> 4 4 14
#> 5 5 15
Created on 2021-12-22 by the reprex package (v2.0.1)
Another option, if you have lots of "A_*" and "B_*" columns (source: Romain François, user: @Romain Francois):
library(dplyr)
df <- data.frame(
A_1 = c(NA, NA, 3, 4, 5),
A_2 = c(1, 2, NA, NA, NA),
B_1 = c(NA, NA, 13, 14, 15),
B_2 = c(11, 12, NA, NA, NA)
)
coacross <- function(...) {
coalesce(!!!across(...))
}
df %>%
transmute(A = coacross(starts_with("A_")),
B = coacross(starts_with("B_")))
#> A B
#> 1 1 11
#> 2 2 12
#> 3 3 13
#> 4 4 14
#> 5 5 15
Created on 2021-12-22 by the reprex package (v2.0.1)
Based on your updated question, you don't have lots of "A_*" or "B_*" columns, but instead lots of "*_1", "*_2", and "*_3" columns. I think this is the most straightforward solution for your use-case:
library(dplyr)
df <- data.frame(Al_TAC4.25.275 = c(1, 1, 1, NA, NA, NA),
Al_TAC4.25.276 = c(NA, NA, 2, 2, 2, NA),
Al_TAC4.25.277 = c(NA, NA, 3, NA, NA, 3),
Au_TAC4.25.275 = c(1, 1, 1, NA, NA, NA),
Au_TAC4.25.276 = c(NA, NA, 2, 2, 2, NA),
Au_TAC4.25.277 = c(NA, NA, 3, NA, NA, NA),
Ar_TAC4.25.275 = c(1, 1, 1, NA, NA, 1),
Ar_TAC4.25.276 = c(NA, NA, 2, 2, 2, 2),
Ar_TAC4.25.277 = c(NA, NA, 3, NA, NA, 3))
df
#> Al_TAC4.25.275 Al_TAC4.25.276 Al_TAC4.25.277 Au_TAC4.25.275 Au_TAC4.25.276
#> 1 1 NA NA 1 NA
#> 2 1 NA NA 1 NA
#> 3 1 2 3 1 2
#> 4 NA 2 NA NA 2
#> 5 NA 2 NA NA 2
#> 6 NA NA 3 NA NA
#> Au_TAC4.25.277 Ar_TAC4.25.275 Ar_TAC4.25.276 Ar_TAC4.25.277
#> 1 NA 1 NA NA
#> 2 NA 1 NA NA
#> 3 3 1 2 3
#> 4 NA NA 2 NA
#> 5 NA NA 2 NA
#> 6 NA 1 2 3
names(df) %>%
split(str_extract(., '[:alpha:]+')) %>%
map_dfc(~ coalesce(!!!df[.x][c(1,2,3)]))
#> # A tibble: 6 × 3
#> Al Ar Au
#> <dbl> <dbl> <dbl>
#> 1 1 1 1
#> 2 1 1 1
#> 3 1 1 1
#> 4 2 2 2
#> 5 2 2 2
#> 6 3 1 NA
# change the order of the list to change the 'priority'
names(df) %>%
split(str_extract(., '[:alpha:]+')) %>%
map_dfc(~ coalesce(!!!df[.x][c(3,2,1)]))
#> # A tibble: 6 × 3
#> Al Ar Au
#> <dbl> <dbl> <dbl>
#> 1 1 1 1
#> 2 1 1 1
#> 3 3 3 3
#> 4 2 2 2
#> 5 2 2 2
#> 6 3 3 NA
names(df) %>%
split(str_extract(., '[:alpha:]+')) %>%
map_dfc(~ coalesce(!!!df[.x][c(2,1,3)]))
#> # A tibble: 6 × 3
#> Al Ar Au
#> <dbl> <dbl> <dbl>
#> 1 1 1 1
#> 2 1 1 1
#> 3 2 2 2
#> 4 2 2 2
#> 5 2 2 2
#> 6 3 2 NA
Created on 2021-12-22 by the reprex package (v2.0.1)