Search code examples
rggplot2geom-bar

geom_bar counts group and not number of cases


So I'm quite new to R and since googling and browsing questions on here did not help me so far, I decided to write mine down.

For descriptive statistics I would like to have a geom_bar Plot. My data frame consists of 21 IDs and each of them has one or more diagnosis. The IDs are obviously numbers from 1 to 21, the diagnosis is coded as 0 and 1 (no and yes). The code I have so far is plotting bars next to each other, but instead of counting the number of cases per group, it plots the number of people per group. So for each diagnosis I have two bars which always represent the number of people per group (attempters vs. non-attempters) and not the number of cases.

My old data frame looks something like this:

code MDD Anxiety PTBS age attempters
01 0 1 1 17 1
02 1 1 0 53 0
03 0 0 1 32 0
04 0 1 0 60 0

but with a lot of columns I don't actually need for my thesis.

At first I changed my data from wide to long and included only columns I need:

df_long <- data_gesamt %>%
  select(code, MDD, Anxiety, PTBS, attempters) %>%
  group_by(code, attempters) %>%
  tidyr::gather(key = predictors,
                value = severity,
                MDD, Anxiety, PTBS) %>% 
  mutate(attempters = as.factor(attempters)) %>% 
  drop_na(attempters)

which got me a data frame as follows:

code attempters predictors severity
01 1 MDD 0
02 0 MDD 1
03 0 MDD 0
04 0 MDD 0
01 1 Anxiety 1
02 0 Anxiety 1
03 0 Anxiety 0
04 0 Anxiety 1
01 1 PTBS 1
02 0 PTBS 0
03 0 PTBS 1
04 0 PTBS 0
01 1 age 17
02 0 age 53
03 0 age 32
04 0 age 60

and then used the following to plot:

plot <- df_long %>%
  ggplot(aes(x = attempters, fill = attempters)) +
  geom_bar() +
  facet_grid(.~ predictors) +
  theme(legend.position = "bottom")

plot

I would need to have a count of how many people with MDD, Anxiety and PTBS I have per group and the mean of the age (I could leave this one out though). So far I get the number of people per group (non-attempters vs. attempters) ... What am I missing or what is wrong?

I would expect something for this. Two bars per group per disorder with the amount of people per group and disorder on the y-axis


Solution

  • [Updated: revised the part that mistaken the nature of the data]

    It is recommended to summarize the data before you graph it so ggplot only deal with the visualization of data instead of some additional calculation. In addition to that, you can also double check if the stat calculated is what you want instead of leave it all to ggplot to do all the calculation behind the scene.

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    # dput of your original table
    df <- structure(list(code = 1:4, MDD = c(0L, 1L, 0L, 0L), Anxiety = c(1L, 
      1L, 0L, 1L), PTBS = c(1L, 0L, 1L, 0L), age = c(17L, 53L, 32L, 
        60L), attempters = c(1L, 0L, 0L, 0L)), row.names = c(NA, -4L), 
      class = "data.frame")
    
    # Calculate the number per predictor & attemp
    graph_data <- df %>%
      # pivot the data and only keep records with identified 
      # predictors has value as 1
      pivot_longer(cols = MDD:PTBS,
        names_to = "predictors", values_to = "value") %>%
      filter(value == 1) %>%
      # I convert attempers to factor as it only 0 and 1
      # Numeric value confusing with ggplot a bit
      group_by(predictors, attempters = factor(attempters)) %>%
      summarize(severity = n(),
        mean_age = mean(age), .groups = "drop")
    
    # data after summarized
    graph_data
    #> # A tibble: 5 x 4
    #>   predictors attempters severity mean_age
    #>   <chr>      <fct>         <int>    <dbl>
    #> 1 Anxiety    0                 2     56.5
    #> 2 Anxiety    1                 1     17  
    #> 3 MDD        0                 1     53  
    #> 4 PTBS       0                 1     32  
    #> 5 PTBS       1                 1     17
    

    Here is the plot output

    # and now plot is taking in graph_data
    ggplot(data = graph_data) +
      # I prefer to do the mapping per geom instead of at ggplot call.
      geom_bar(mapping = aes(
        x = attempters,
        # here y value is the sum of severity calculated earlier
        y = severity,
        # when do fill/colors I prefer to explicit specify the group even if it
        # can be auto by ggplot
        fill = attempters, group = attempters),
        # Here stat is identity instead of default count
        stat = "identity", 
        # position_dodge for avoid bar stacked on each other
        position = position_dodge()) +
      facet_grid(.~ predictors) +
      theme(legend.position = "bottom")
    

    Created on 2021-04-17 by the reprex package (v2.0.0)