Search code examples
rggplot2data-science

How to create a ggplot bar chart with multiple columns of data for y?


My dataset looks like this :

Here's a sample of the dataset. Just want to plot the columns marked in yellow

Here's a table with a sample of the data:

City AverageClientUsage AverageClientEst EstimatedTraffic
Atlanta 2695.68 3555.62 2812.89
Boston 559.48 1080.49 583.81
Chicago 3314.44 5728 3458.56

I'd like ggplot to use City as the X axis and have three bars for each point on the x axis, one for AverageClientUsage, one for AverageClientEst, one for EstimatedTraffic. How do I go about doing this? At the end I'd like the ggplot to look like this:

Plot created using Microsoft Excel


Solution

  • First, you need to pivot_longer() your dataframe:

    library(dplyr)
    df_long <- df %>% pivot_longer(!City, names_to = "Type", values_to = "Count")
    

    Afterwards, you can create bars filled by Type, and using position = "dodge" within geom_col()

    library(ggplot)
    ggplot(df_long, aes(x = City, y = Count, fill = Type)) + # specify x and y axis, specify fill
             geom_col(position = position_dodge(0.7), width = 0.6, color = "black") + # position.dodge sets the bars side by side
      theme_minimal() + # add a ggplot theme
      theme(legend.position = "bottom", # move legend to bottom
            legend.title = element_blank(), # remove legend title
            axis.text.x = element_text(angle = 45, vjust = 0.5, color = "gray33"), # rotate x axis text by 45 degrees, center again, change color
            axis.text.y = element_text(color = "gray33"), # change y axis text coor
            axis.title = element_blank(), # remove axis titles
            panel.grid.major.x = element_blank()) + # remove vertical grid lines
      scale_fill_manual(values = c("blue", "darkorange2", "gray")) # adjust the bar colors
    

    enter image description here

    Data

    df <- structure(list(City = c("Atlanta", "Boston", "Chicago"), AverageClientUsage = c(2695.68, 
          559.48, 3314.44), AverageClientEst = c(3555.62, 1080.49, 5728
          ), EstimatedTraffic = c(2812.89, 583.81, 3458.56)), class = c("tbl_df", 
          "tbl", "data.frame"), row.names = c(NA, -3L))