Search code examples
rggplot2histogram

How to build an histogram in R filling the bars with several columns binary coded?


I am pretty new to ggplot2 and I would like to draw a histogram of the number of articles published per year (or 5 years) for a systematic review. I have a df like that:

Df <- data.frame(   name = c("article1", "article2", "article3", "article4"),    
date = c(2004, 2009, 1999, 2007),   
question1 = c(1,0,1,0),   
question2 = c(1,1,1,1),   
question3 = c(1,1,1,1),  
 question4 = c(0,0,0,0),   
question5 = c(1,0,1,0), stringsAsFactors = FALSE ) 

ggplot(Df, aes (date))+   
geom_histogram(binwidth = 5, color= "black")

Plus, for each bar of the histogram, I would like to fill the bars with the number of articles that covered a particular type of question (question 1 to 5, coded 1 or 0 depending on if the question is present or absent).The thing is I have 5 questions I would like to make visible in one diagram. And I don't know how to do that... I tried the fill argument and to do it with a geom_bar but failed.

Thanks so much in advance for your help


Solution

  • Here is a way. It's a simple bar plot with ggplot.
    This type of problems generally has to do with reshaping the data. The format should be the long format and the data is in wide format. See this post on how to reshape the data from wide to long format.

    library(dplyr)
    library(tidyr)
    library(ggplot2)
    
    t %>%
      select(-Code) %>%
      pivot_longer(
        cols = starts_with("Question"),
        names_to = "Question"
      ) %>%
      mutate(Publication_date = factor(Publication_date)) %>%
      ggplot(aes(Publication_date, fill = Question)) +
      geom_bar() +
      xlab("Publication Date")
    

    enter image description here

    Test data

    set.seed(2021)
    n <- 200
    Code <- paste0("Article", 1:n)
    Publication_date <- sample(2000:2020, n, TRUE)
    Question <- replicate(5, rbinom(n, 1, 0.5))
    colnames(Question) <- paste0("Question", 1:5)
    
    t <- data.frame(Code, Publication_date)
    t <- cbind(t, Question)