Search code examples
rfunctionggplot2tidyrfacet-wrap

How to solve Error: Selections can't have missing values


I have written a code that works fine. But it does not work within a function.

My sample data is the below:

set.seed(34)
children <- data.frame(
  ID = 1:100,
  gender = as.integer(sample(c(1,2),100,replace = TRUE)),
  height = ifelse(children$gender=="1", sample(120:140), sample(110:130)),
  weight = ifelse(children$gender=="1", sample(25:35), sample(15:25)),
  ave_sleep = ifelse(children$gender=="1" & children$height > 130, sample(7:9),
                     ifelse(children$gender=="1" & children$height <= 130, sample(4:6),
                            ifelse(children$gender=="2" & children$height > 120, sample(7:9), sample(4:6)))))
childrenNA <- bind_cols(children[1],missForest::prodNA(children[-1],noNA=0.1))

And my code below works fine.

childrenNA %>%
  gather(-gender, key="key", value="val") %>%
  mutate(missing=is.na(val)) %>%
  mutate(gender=coalesce(gender, 0)) %>%
  filter(missing==TRUE) %>%
  group_by(gender, key, missing) %>%
  ggplot() +
  stat_count(aes(y=key)) +
  facet_wrap(~gender) +
  labs(x='no_missing_values', y="variable") +
  coord_flip()

MY graph

However, my code gets Error: Selections can't have missing values within a function. The below is what I have done to creat a function.

miss_group <- function(df, facet) {
  df %>%
    gather(-facet, key="key", value="val") %>%
    mutate(missing=is.na(val)) %>%
    mutate(facet=coalesce(facet, 0)) %>%
    filter(missing==TRUE) %>%
    group_by(facet, key, missing) %>%
    ggplot() +
    stat_count(aes(y=key)) +
    facet_wrap(~facet) +
    labs(x='no_missing_values', y="variable") +
    coord_flip()
}

Could you please help me to solve the error?


Solution

  • Your data generation code does not work, because the variables in your dataframe cannot be evaluated (like children$gender == 1 etc.) before this dataframe (and these variables) has been created. I updated your code to make it reproducible:

    #packages
    library(tidyr)
    library(dplyr)
    library(ggplot2)
    
    #make data set
    set.seed(34)
    children <- data.frame(ID = 1:100, gender = as.integer(sample(c(1,2),100, replace = TRUE)),
                            height = NA, weight = NA, ave_sleep = NA)
    children$height <- ifelse(children$gender==1, sample(120:140), sample(110:130))
    children$weight <- ifelse(children$gender==1, sample(25:35), sample(15:25))
    children$ave_sleep <- ifelse(children$gender==1 & children$height > 130, sample(7:9),
                                 ifelse(children$gender==1 & children$height <= 130, sample(4:6),
                                 ifelse(children$gender==2 & children$height > 120, sample(7:9), sample(4:6))))
    childrenNA <- bind_cols(children[1],missForest::prodNA(children[-1],noNA=0.1))
    

    I can´t replicate your exact error message. But I think the problem is how you attempt to pass the parameter facet to your function and how this parameter is then used in the function. I assume you want to submit the name of a variable to the function, e.g. miss_group(df, gender). But within the function this name should be used to index the corresponding column in df. One way to do this would be to use enquo() and !!. I am not sure this is the best way to do this, but it does serve to produce the plot you intended.

    #create plot with function
    miss_group <- function(df, facetname) {
      facet <- enquo(facetname)
      df %>%
        gather(-!!facet, key="key", value="val") %>%
        mutate(missing=is.na(val)) %>%
        mutate(facet=coalesce(!!facet, 0)) %>%
        filter(missing==TRUE) %>%
        group_by(!!facet, key, missing) %>%
        ggplot() +
        stat_count(aes(y=key)) +
        facet_wrap(~facet) +
        labs(x='no_missing_values', y="variable") +
        coord_flip()
    }
    
    
    #create plot with function
    miss_group(childrenNA, gender)