Search code examples
rggplot2bar-charttidyversestacked-chart

Why does the stacked bar chart not match values in table?


I have a dataset containing cities and prices for specific items in those cities for going out (e.g. taxi, drinks, dinner etc.) - dataset can be found here: https://data.world/makeovermonday/2018w48

I calculated the total sum to pay for a party night and a date night:

    CostNightPrepared <- CostNight  %>%
  group_by(City, Category) %>%
  mutate(TotalCost = sum(Cost, na.rm = TRUE))%>%
  arrange(desc(Category), TotalCost)

To plot it out:

Visual <- ggplot(CostNightPrepared, aes(TotalCost, fct_rev(fct_reorder(City, TotalCost)), fill=Category)) + 
geom_col(position = "stack") +
geom_text(aes(label = round(TotalCost, 1)), position = position_dodge(1))

it gives me the following output:

enter image description here

If you notice, for example, the last city, Zurich, has the value for 179 for "Party Night", however, the column reachers to around 800 on x axis! The same goes for all other columns - they do not match their values for both the "Date night" and "Party Night". What is the issue here?

If I do the same code, but using position = dodge for geom_col(), then it works:

Visual <- ggplot(CostNightPrepared, aes(TotalCost, fct_rev(fct_reorder(City, TotalCost)), fill=Category)) + 
  geom_col(position = "dodge") +
  geom_text(aes(label = round(TotalCost, 1)), position = position_dodge(1))

Here's the output:

enter image description here

As you can see, the values match their correspond column sizes (lenghts) on the x axis.

so why, when using position = "dodge", my columns don't match the actual values in the dataset and have arbitrary values on the x axis?


Solution

  • I think you want summarize instead of mutate. By using mutate, you got the City/Category total for every row, and then fed each of those rows into ggplot2. You really want one row for each City/Category combination, which is what summarize produces.

    Reproducible example:

    mtcars %>%
    head() %>%
      group_by(carb, gear) %>%
      mutate(total_wt = sum(wt)) %>%
      ungroup() -> mtcars_summary
        
    #mtcars_summary
    ## A tibble: 6 x 12
    #    mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb total_wt
    #  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl>
    #1  21       6   160   110  3.9   2.62  16.5     0     1     4     4     5.50
    #2  21       6   160   110  3.9   2.88  17.0     0     1     4     4     5.50
    #3  22.8     4   108    93  3.85  2.32  18.6     1     1     4     1     2.32
    #4  21.4     6   258   110  3.08  3.22  19.4     1     0     3     1     6.68
    #5  18.7     8   360   175  3.15  3.44  17.0     0     0     3     2     3.44
    #6  18.1     6   225   105  2.76  3.46  20.2     1     0     3     1     6.68
    

    Note above that mutate gives each row its group's total weight. ggplot2::geom_col will then stack all the rows it receives, leading to longer bars than you wanted. (Another hint is that the text looks "overplotted" -- this is because every character is being printed once for each row in the group -- ie you might have ten copies of the same text on top of each other, leading to poor anti-aliasing appearance.)

    ggplot(mtcars_summary, aes(total_wt, 
                               carb %>% as_factor %>% fct_reorder(total_wt), 
                               fill = as.character(gear))) +
      geom_col(position = "stack") +
      geom_text(aes(label = round(total_wt, 1)), position = position_dodge(1))
    

    enter image description here

    If we replace mutate with summarize, we get more what you expected, where the input going into the bars is not repeated for each element in your original data:

    #mtcars_summary
    ## A tibble: 4 x 3
    #   carb  gear total_wt
    #  <dbl> <dbl>    <dbl>
    #1     1     3     6.68
    #2     1     4     2.32
    #3     2     3     3.44
    #4     4     4     5.50
    

    enter image description here