Search code examples

How to create histogram for below data in R

My Dataframe with 4 obs with 6 variables

  MonthYr : 202101 202102 202103 202104
  Count1 : 123456 123425 123452 123455
  Count2 : 123456 123429 123453 123454
  Count3 : 123455 123428 123455 123455
  Count4 : 123455 123428 123455 123455
  Count5 : 123455 123428 123455 123455

Output: enter image description here Added image I want as my graph. And the its growing, If I run code next month new count would be auto added for month of May and onwards.

Can some one help me to histogram code in R?

Thank you in Advance


  • You can't really make a histogram with so few datapoints, but you could visualize the distribution of each "Count" variable using boxplots, e.g.

    # Create the dataframe
    df <- data.frame(MonthYr = c(202101, 202102, 202103, 202104),
               Count1 = c(123456, 123425, 123452, 123455),
               Count2 = c(123456, 123429, 123453, 123454),
               Count3 = c(123455, 123428, 123455, 123455),
               Count4 = c(123455, 123428, 123455, 123455),
               Count5 = c(123455, 123428, 123455, 123455))
    # Make MonthYr a factor
    df$MonthYr = factor(x = df$MonthYr,
           levels = c("202101", "202102", "202103", "202104"),
           labels = c("Jan - 2021", "Feb - 2021", "Mar - 2021", "Apr - 2021"))
    # Reshape the dataframe to the "long" format
    df2 <- reshape(df, varying = 2:6, v.names = c("Value"),
            direction = "long")
    # Plot the distribution of Counts for each MonthYr
    plot(x = df2$MonthYr, y = df2$Value, xlab = "Month - Year",
         ylab = "Values", main = "Distribution of Counts for each Timepoint")


    For histograms:

    "Jan 2021" <- df2$Value[df2$MonthYr == "Jan - 2021"]
    "Feb 2021" <- df2$Value[df2$MonthYr == "Feb - 2021"]
    "Mar 2021" <- df2$Value[df2$MonthYr == "Mar - 2021"]
    "Apr 2021" <- df2$Value[df2$MonthYr == "Apr - 2021"]
    par(mfrow = c(2, 2))
    hist(`Jan 2021`, las = 2, xlab = "")
    hist(`Feb 2021`, las = 2, xlab = "")
    hist(`Mar 2021`, las = 2, xlab = "")
    hist(`Apr 2021`, las = 2, xlab = "")



    You can plot all 4 histograms on the same plot using e.g.

    breaks <- seq(min(df2$Value), max(df2$Value), 0.5)
    yaxis <- seq(1, 4, length.out = 20)
    plot(x = df2$Value, y = yaxis, type = "n", ylab = "Count", xlab = "Value")
    hist(`Jan 2021`, add = TRUE, breaks = breaks, col = 1, border = 1)
    hist(`Feb 2021`, add = TRUE, breaks = breaks, col = 2, border = 2)
    hist(`Mar 2021`, add = TRUE, breaks = breaks + 0.33, col = 3, border = 3)
    hist(`Apr 2021`, add = TRUE, breaks = breaks + 0.66, col = 4, border = 4)
    legend("topleft", c("Jan", "Feb", "Mar", "Apr"), fill = 1:4)


    However, you can see that they overlap (I used a small offset in the breaks so they don't overlap completely). I think a better way of handling it would be to use ggplot graphics library:

    ggplot(df2, aes(x = Value, fill = MonthYr)) +
      geom_bar() +
      scale_y_continuous(breaks = 1:10)


    Or, stacked side-by-side:

    ggplot(df2, aes(x = Value, fill = MonthYr)) +
      geom_bar(position = position_dodge(preserve = "single"))


    Or facetted:

    ggplot(df2, aes(x = Value, fill = MonthYr)) +
      geom_bar() +


    EDIT 2:

    Based on your comment below and the picture you have now provided, you don't want a histogram: you want a barchart/barplot. Here is an example of a barplot using the ggplot library

    # Create the dataframe
    df <- data.frame(MonthYr = c(202101, 202102, 202103, 202104),
               Count1 = c(123456, 123425, 123452, 123455),
               Count2 = c(123456, 123429, 123453, 123454),
               Count3 = c(123455, 123428, 123455, 123455),
               Count4 = c(123455, 123428, 123455, 123455),
               Count5 = c(123455, 123428, 123455, 123455))
    # Make MonthYr a factor
    df$MonthYr = factor(x = df$MonthYr,
           levels = c("202101", "202102", "202103", "202104"),
           labels = c("Jan - 2021", "Feb - 2021", "Mar - 2021", "Apr - 2021"))
    # Reshape the dataframe to the "long" format
    df2 <- reshape(df, varying = 2:6, v.names = c("Value"),
            direction = "long")
    df2$Count <- factor(df2$time,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Count 1", "Count 2", "Count 3", "Count 4", "Count 5"))
    plot1 <- ggplot(df2, aes(x = MonthYr, y = Value, fill = Count)) +
      geom_bar(width = 0.5, position = position_dodge(0.7), stat = "identity") +
      coord_cartesian(ylim = c(123405, 123460)) +
      theme_dark(base_size = 16) +
      theme(axis.title = element_blank(),
            legend.position = "bottom")
    table_theme <- ttheme_default(base_size = 14, padding = unit(c(8, 8), "mm"))
    table1 <- tableGrob(df, rows = NULL, theme = table_theme)
    grid.arrange(plot1, table1, nrow = 2, heights = c(1, 0.75))
