Search code examples
rggplot2meangeom-bar

How do you plot multiple bars (each bar representing a column) that show the average value of numerical values of a column in R?


I have four columns that contain numberical values (hundreds of rows). I would like to plot a bar chart that shows the average value of each of those columns on one chart. So it would show 4 bars on one bar chart, and each bar would represent one column.

Columns are veryactive, fairlyactive, lightlyactive, sedentary. I already know the mean for each column with the summary function, but I want to plot it on a chart. Do I need another variable for one of the other axis?

I was able to plot one of the columns in a bar chart and showing calories as the x axis, but I would just like to compare the mean for each column within a bar chart.

ggplot(Activity_Zero, aes(x = calories, y = veryactive))+
  stat_summary(geom = 'bar', fun.y = 'mean')

Here is a sample of my data: tibble of my data


Solution

  • It depends on how your data are formatted. I've provided two examples below.

    If you're starting with a table with the summary values, you can do this

    library(ggplot2)
    
    levels <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
    df1 <- data.frame(activitylevel = factor(levels, levels = levels),
                     meancalories = c(3000, 2500, 2000, 1500))
    ggplot(df1, aes(x = activitylevel, y = meancalories)) +
      geom_col()
    

    Created on 2023-01-30 by the reprex package (v2.0.1)

    And if you're starting with your original data in long form, you can do this.

    library(ggplot2)
    levels <- c("veryactive", "fairlyactive", "lightlyactive", "sedentary")
    df2 <- data.frame(activitylevel = factor(rep(levels,
                                         each = 20), levels = levels),
                     calories = c(rnorm(20, 3000, 100),
                                  rnorm(20, 2500, 100),
                                  rnorm(20, 2000, 100),
                                  rnorm(20, 1500, 100))
                     )
    ggplot(df2, aes(x = activitylevel, y = calories)) +
      stat_summary(geom = "col", fun = "mean")
    

    Created on 2023-01-30 by the reprex package (v2.0.1)

    Finally, if you're starting with your data in wide form (i.e. a column for each activity level) then I'd suggest you look up the function tidyr::pivot_longer, which will wrangle your data into the form required for stat_summary.