I have written a code that works fine. But it does not work within a function.
My sample data is the below:
set.seed(34)
children <- data.frame(
ID = 1:100,
gender = as.integer(sample(c(1,2),100,replace = TRUE)),
height = ifelse(children$gender=="1", sample(120:140), sample(110:130)),
weight = ifelse(children$gender=="1", sample(25:35), sample(15:25)),
ave_sleep = ifelse(children$gender=="1" & children$height > 130, sample(7:9),
ifelse(children$gender=="1" & children$height <= 130, sample(4:6),
ifelse(children$gender=="2" & children$height > 120, sample(7:9), sample(4:6)))))
childrenNA <- bind_cols(children[1],missForest::prodNA(children[-1],noNA=0.1))
And my code below works fine.
childrenNA %>%
gather(-gender, key="key", value="val") %>%
mutate(missing=is.na(val)) %>%
mutate(gender=coalesce(gender, 0)) %>%
filter(missing==TRUE) %>%
group_by(gender, key, missing) %>%
ggplot() +
stat_count(aes(y=key)) +
facet_wrap(~gender) +
labs(x='no_missing_values', y="variable") +
coord_flip()
However, my code gets Error: Selections can't have missing values within a function. The below is what I have done to creat a function.
miss_group <- function(df, facet) {
df %>%
gather(-facet, key="key", value="val") %>%
mutate(missing=is.na(val)) %>%
mutate(facet=coalesce(facet, 0)) %>%
filter(missing==TRUE) %>%
group_by(facet, key, missing) %>%
ggplot() +
stat_count(aes(y=key)) +
facet_wrap(~facet) +
labs(x='no_missing_values', y="variable") +
coord_flip()
}
Could you please help me to solve the error?
Your data generation code does not work, because the variables in your dataframe cannot be evaluated (like children$gender == 1
etc.) before this dataframe (and these variables) has been created. I updated your code to make it reproducible:
#packages
library(tidyr)
library(dplyr)
library(ggplot2)
#make data set
set.seed(34)
children <- data.frame(ID = 1:100, gender = as.integer(sample(c(1,2),100, replace = TRUE)),
height = NA, weight = NA, ave_sleep = NA)
children$height <- ifelse(children$gender==1, sample(120:140), sample(110:130))
children$weight <- ifelse(children$gender==1, sample(25:35), sample(15:25))
children$ave_sleep <- ifelse(children$gender==1 & children$height > 130, sample(7:9),
ifelse(children$gender==1 & children$height <= 130, sample(4:6),
ifelse(children$gender==2 & children$height > 120, sample(7:9), sample(4:6))))
childrenNA <- bind_cols(children[1],missForest::prodNA(children[-1],noNA=0.1))
I can´t replicate your exact error message. But I think the problem is how you attempt to pass the parameter facet
to your function and how this parameter is then used in the function. I assume you want to submit the name of a variable to the function, e.g. miss_group(df, gender)
. But within the function this name should be used to index the corresponding column in df. One way to do this would be to use enquo()
and !!
. I am not sure this is the best way to do this, but it does serve to produce the plot you intended.
#create plot with function
miss_group <- function(df, facetname) {
facet <- enquo(facetname)
df %>%
gather(-!!facet, key="key", value="val") %>%
mutate(missing=is.na(val)) %>%
mutate(facet=coalesce(!!facet, 0)) %>%
filter(missing==TRUE) %>%
group_by(!!facet, key, missing) %>%
ggplot() +
stat_count(aes(y=key)) +
facet_wrap(~facet) +
labs(x='no_missing_values', y="variable") +
coord_flip()
}
#create plot with function
miss_group(childrenNA, gender)