Search code examples
rggplot2bar-chartfrequencystacked-chart

Stacked Barplot by two conditions with frequency data


I am new to R and trying to create a stacked barplot with frequency data. Sorry if a similar thing has been asked before, but I can't figure it out!

Sample data:

          burn dist perc_bg perc_moss perc_litter
 Site 1   b     0   0.6      0.4      0.0
 Site 1   b     3   0.2      0.7      0.1
 Site 1   b    10   0.3      0.4      0.3
 Site 2   u     0   0.7      0.2      0.1
 Site 2   u     3   0.4      0.3      0.3 
 Site 2   u    10   0.1      0.2      0.7
 Site 3   b     0   0.2      0.4      0.4
 Site 3   b     3   0.3      0.6      0.1
 Site 3   b    10   0.2      0.3      0.5
 Site 4   u     0   0.7      0.2      0.1
 Site 4   u     3   0.5      0.4      0.1
 Site 4   u    10   0.3      0.2      0.5

I want to create stacked bar plots by distance and burn (3 stacked plots with fill=cover type (perc_bg, perc_moss, perc_litter) of 0, 3, 10 distance for burned (b) and 3 stacked bars of 0, 3, 10 distance for unburned (u). So I need to take the average frequency of each cover type for each distance grouped by burn zone and I am lost. Any help would be very appreciated.


Solution

  • Here is a tidyverse solution, and I recommend visiting that link for lots more information.

    Assuming your data is a dataframe named mydata, and the sites are in a column named site (see end of question for how I made the data example).

    Install packages if required using install.packages('tidyverse') and load:

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    

    Now the first issue is that your cover type is in 3 columns ("wide" format) and you want "long" format - one column for the type, one for the value. You can use tidyr::pivot_longer() for that:

    mydata %>% 
      pivot_longer(cols = 4:6, names_to = "cover_type")
    

    Run that and note the result. Note the use of pipes - %>% - to pass data through a series of steps.

    You probably want to remove the perc_ prefix, and convert dist to categories (factors in R), since bar plots have a categorical x-axis. Use dplyr::mutate() for that:

    mydata %>% 
      pivot_longer(cols = 4:6, names_to = "cover_type") %>% 
      mutate(cover_type = gsub("perc_", "", cover_type), dist = factor(dist))
    

    Again, run and note the result.

    Finally we can pass the data to ggplot. We want to plot value versus dist, fill by cover_type and facet (separate plots side by side) by burn. Use position_fill for summing to 1.

    mydata %>% 
      pivot_longer(cols = 4:6, names_to = "cover_type") %>% 
      mutate(cover_type = gsub("perc_", "", cover_type), dist = factor(dist)) %>% 
      ggplot(aes(dist, value)) + 
      geom_col(aes(fill = cover_type), position = position_fill()) + 
      facet_wrap(~burn) +
      labs(title = "Cover type by distance and burn")
    

    Result. This is just the basics, there are many ways to customise the plot.

    enter image description here

    The data:

    mydata <- read.table(text = "site burn dist perc_bg perc_moss perc_litter
     'Site 1'   b     0   0.6      0.4      0.0
     'Site 1'   b     3   0.2      0.7      0.1
     'Site 1'   b    10   0.3      0.4      0.3
     'Site 2'   u     0   0.7      0.2      0.1
     'Site 2'   u     3   0.4      0.3      0.3 
     'Site 2'   u    10   0.1      0.2      0.7
     'Site 3'   b     0   0.2      0.4      0.4
     'Site 3'   b     3   0.3      0.6      0.1
     'Site 3'   b    10   0.2      0.3      0.5
     'Site 4'   u     0   0.7      0.2      0.1
     'Site 4'   u     3   0.5      0.4      0.1
     'Site 4'   u    10   0.3      0.2      0.5", header = TRUE)