I'm working with panel data where some of the questionnaire items get asked only once while most others are asked several times across different waves. So let's assume there I have survey responses on a categorical variable.
Even though it's likely panelists will be stable in their reponse on a given question, it may occur that they change their opinion, or even drop out of the panel. Others might enter the panel at a later point in time (in a later wave).
So let's say this leads to a dataset which looks like this, where var_1a and var_1b are two variables that measure the same thing just in two different waves:
mydata<-data.frame(
var_1a = c(1,2,3,4,5,NA),
var_1b = c(1,2,5,5,NA,NA)
)
Now I would like to automatically combine these variables into one single variable across the two waves. As a condition, I would like to take the mean if respondents have changed their opinion from one wave to the other. Furthermore, I would like to keep the reponse from wave 1 even if the respondent has droped out of the panel in wave 2.
The way I have been doing this in R is by using this function:
combo<-function(x,...){
V<-rowMeans(x[, c(...) ], na.rm=T) *
ifelse(rowMeans(is.na(x[, c(...) ])) == ncol(x[, c(...) ]), NA, 1)
ifelse(is.nan(V),NA,V)}
For some reason, this introduces "NaN" if all variables are consistently NA across all waves, hence the last line of code in the function.
This works just fine and I get what I wanted:
> mydata$combo_var<-combo(mydata, c("var_1a","var_1b"))
> mydata
var_1a var_1b combo_var
1 1 1 1.0
2 2 2 2.0
3 3 -3 0.0
4 4 9 6.5
5 5 NA 5.0
6 NA NA NA
7 NA NA NA
8 NA NA NA
9 NA NA NA
10 NA NA NA
However, I was wondering if there's a better and easier way to do this. Is there a way I could use dplyr to do the same? Any tidyverse solution?
Some tidyverse
options:
Data
mydata <- data.frame(
var_1a = c(1,2,3,4,5,NA),
var_1b = c(1,2,5,5,NA,NA)
)
library(dplyr)
library(purrr)
mydata <- mydata %>%
mutate(
combo_var = pmap_dbl(select(., contains("var")),
~mean(c(...), na.rm = T))
)
Result:
var_1a var_1b combo_var
1 1 1 1.0
2 2 2 2.0
3 3 5 4.0
4 4 5 4.5
5 5 NA 5.0
6 NA NA NaN
mydata <- mydata %>%
mutate(
combo_var = rowMeans(select(., contains("var")), na.rm = T)
)
Result:
var_1a var_1b combo_var
1 1 1 1.0
2 2 2 2.0
3 3 5 4.0
4 4 5 4.5
5 5 NA 5.0
6 NA NA NaN