I have some table data that has been scattered across around 1000 variables in a dataset. Most are split across 2 variables, and I can piece together the data using coalesce, however this is pretty inefficient for some variables which are instead spread across >10. Is there are a better/more efficient way?
The syntax I have written so far is:
scattered_data <- df %>%
select(id, contains("MASS9A_E2")) %>%
#this brings in all the variables for this one question that start with this string
mutate(speciality = coalesce(MASS9A_E2_C4_1,MASS9A_E2_C4_2,MASS9A_E2_C4_3, MASS9A_E2_C4_4, MASS9A_E2_C4_5, MASS9A_E2_C4_6, MASS9A_E2_C4_7, MASS9A_E2_C4_8, MASS9A_E2_C4_9, MASS9A_E2_C5_1,MASS9A_E2_C5_2,MASS9A_E2_C5_3, MASS9A_E2_C5_4, MASS9A_E2_C5_5, MASS9A_E2_C5_6, MASS9A_E2_C5_7, MASS9A_E2_C5_8, MASS9A_E2_C5_9))
As I have this for 28 MASS questions and would really love to be able to collapse these down a bit quicker.
You can use do.call()
to take all columns except id
as input of coalesce()
.
library(dplyr)
df %>%
select(id, contains("MASS9A_E2")) %>%
mutate(speciality = do.call(coalesce, select(df, -id)))
In addition, you can call coalesce()
iteratively by Reduce()
.
df %>%
select(id, contains("MASS9A_E2")) %>%
mutate(speciality = Reduce(coalesce, select(df, -id)))