Search code examples
rggplot2geom-bar

Stacked Bar Plot ggplot2


I know this gets asked a lot, but I'm having trouble making a 100% stacked bar plot in R. I know there are tons of pages out there explaining how, but nothing is working and I think the data I'm importing isn't configured correctly, so basically I want to know what I'm doing wrong in that respect. The data I'm using looks like the data in the attached picture. I'm able to create the exact chart I want in Excel, which I've also attached (the bar graph on the right; I couldn't attach more than one picture so they're just both in the same one), but for various reasons I need it to be in R. Is the way the data is written in Excel incorrect, and if so, how do I make it right?

data being used on left, correct excel graph on right


Solution

  • In ggplot2 at least, you need to convert your data from "wide" to "long" format. Below, I use the tidyr::gather function to "gather" the two data columns ("running" and "jumping") into a single "fraction" column, which you can then color by "activity".

    library(magrittr)                       # For pipe (%>%)
    
    dat <- tibble::tibble(
      weeks = 1:15,
      running = runif(15, 0, 1),
      jumping = 1 - running
    )
    
    dat
    #> # A tibble: 15 x 3
    #>    weeks running jumping
    #>    <int>   <dbl>   <dbl>
    #>  1     1  0.675   0.325 
    #>  2     2  0.727   0.273 
    #>  3     3  0.430   0.570 
    #>  4     4  0.324   0.676 
    #>  5     5  0.809   0.191 
    #>  6     6  0.260   0.740 
    #>  7     7  0.433   0.567 
    #>  8     8  0.872   0.128 
    #>  9     9  0.0288  0.971 
    #> 10    10  0.903   0.0970
    #> 11    11  0.295   0.705 
    #> 12    12  0.538   0.462 
    #> 13    13  0.342   0.658 
    #> 14    14  0.291   0.709 
    #> 15    15  0.877   0.123
    
    library(ggplot2)
    
    dat_long <- dat %>%
      tidyr::gather(activity, fraction, running, jumping)
    
    dat_long
    #> # A tibble: 30 x 3
    #>    weeks activity fraction
    #>    <int> <chr>       <dbl>
    #>  1     1 running    0.675 
    #>  2     2 running    0.727 
    #>  3     3 running    0.430 
    #>  4     4 running    0.324 
    #>  5     5 running    0.809 
    #>  6     6 running    0.260 
    #>  7     7 running    0.433 
    #>  8     8 running    0.872 
    #>  9     9 running    0.0288
    #> 10    10 running    0.903 
    #> # ... with 20 more rows
    
    ggplot(dat_long) +
      aes(x = factor(weeks), y = fraction, fill = activity) +
      geom_col()
    

    You can also do this in base R by converting to a "wide" matrix. (Note that I also use [, -1] to drop the first column).

    dat_tmat <- t(as.matrix(dat[, -1]))
    dat_tmat
    #>              [,1]      [,2]      [,3]      [,4]       [,5]      [,6]
    #> running 0.5227949 0.5352537 0.5879579 0.2678927 0.93068128 0.2948861
    #> jumping 0.4772051 0.4647463 0.4120421 0.7321073 0.06931872 0.7051139
    #>               [,7]      [,8]      [,9]       [,10]      [,11]     [,12]
    #> running 0.07729363 0.8925416 0.5503279 0.007479232 0.02991765 0.5832765
    #> jumping 0.92270637 0.1074584 0.4496721 0.992520768 0.97008235 0.4167235
    #>             [,13]     [,14]     [,15]
    #> running 0.8660134 0.1156794 0.3176998
    #> jumping 0.1339866 0.8843206 0.6823002
    
    barplot(dat_tmat, col = c("blue", "red"))
    legend("topleft", c("running", "jumping"), col = c("blue", "red"), lwd = 5, bg = "white")