Search code examples
rggplot2r-haven

How to create a ggplot when the answers are FALSE or TRUE?


How can I create a plot with ggplot when my answers are TRUE or FALSE?

This is my code:

t.obese<-master1%>%
  filter(Income>0,obese==TRUE)%>%
  select(Income,obese)

> head(t.obese)
  Income obese
1  21600    TRUE
2   4000    TRUE
3  12720    TRUE
4  26772    TRUE

when I am trying to create a plot , r tells me " Don't know how to automatically pick scale for object of type haven_labelled/vctrs_vctr/double. Defaulting to continuous. Fehler: stat_count() can only have an x or y aesthetic."

Thank you!

> dput(t.obese[1:10, ])
structure(list(Income = structure(c(1944, 4000, 16000, 19200, 
22800, 21600, 18000, 18000, 2000, 18000), label = "Wages,Salary from                    main job", format.stata = "%42.0g", labels = c(`[-5] in Fragebogenversion    nicht enthalten` = -5, 
 `[-2] trifft nicht zu` = -2), class = c("haven_labelled",      "vctrs_vctr", 
 "double")), obese = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, 
TRUE, TRUE, TRUE)), row.names = c(NA, 10L), class = "data.frame")

Solution

  • If you want to compare Income distribution across obesity, then you need both obese = TRUE and obese = FALSE, so you can do the comparison

    I randomly created an non_obese dataset just to do the comparison. Also, I removed the haven_labelled class for the Income since it was causing some issues in the reprex rendering [using haven::zap_labels()

    Anyway, hope the following will help you get started

    library(dplyr)
    library(ggplot2)
    library(haven)
    
    obese <- 
    structure(list(Income = structure(c(1944, 4000, 16000, 19200, 
                                        22800, 21600, 18000, 18000, 2000, 18000), 
                                      label = "Wages,Salary from main job", 
                                      format.stata = "%42.0g", 
                                      labels = c(`[-5] in Fragebogenversion nicht enthalten` = -5,
                                                 `[-2] trifft nicht zu` = -2), 
                                      class = c("haven_labelled", "vctrs_vctr","double")), 
                   obese = c(TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE,TRUE, TRUE, TRUE)), 
              row.names = c(NA, 10L), class = "data.frame"
              )
    
    
    # remove the haven/labelled class of the income variable
    obese <- 
      obese %>% 
      haven::zap_labels() 
    
    non_obese <- 
      obese %>% 
      mutate(
        Income = Income - rnorm(1, mean = 1000, sd = 50),
        obese  = !obese
      )
    
    
    
    full_data <- 
      bind_rows(obese, non_obese)
    
    
    # Box plot 
    full_data %>% 
      ggplot(
        aes(obese, Income)
      )+
      geom_boxplot(width = 0.5)+
      geom_point(position = position_jitter(width  = 0.05))
    

    # Density plot
    full_data %>% 
      ggplot(
        aes(Income,fill = obese)
      )+
      geom_density(alpha = 0.5)
    

    Created on 2020-12-03 by the reprex package (v0.3.0)