Search code examples
rhistogram

How to create histogram for below data in R


My Dataframe with 4 obs with 6 variables

  MonthYr : 202101 202102 202103 202104
  Count1 : 123456 123425 123452 123455
  Count2 : 123456 123429 123453 123454
  Count3 : 123455 123428 123455 123455
  Count4 : 123455 123428 123455 123455
  Count5 : 123455 123428 123455 123455

Output: enter image description here Added image I want as my graph. And the its growing, If I run code next month new count would be auto added for month of May and onwards.

Can some one help me to histogram code in R?

Thank you in Advance


Solution

  • You can't really make a histogram with so few datapoints, but you could visualize the distribution of each "Count" variable using boxplots, e.g.

    # Create the dataframe
    df <- data.frame(MonthYr = c(202101, 202102, 202103, 202104),
               Count1 = c(123456, 123425, 123452, 123455),
               Count2 = c(123456, 123429, 123453, 123454),
               Count3 = c(123455, 123428, 123455, 123455),
               Count4 = c(123455, 123428, 123455, 123455),
               Count5 = c(123455, 123428, 123455, 123455))
    
    # Make MonthYr a factor
    df$MonthYr = factor(x = df$MonthYr,
           levels = c("202101", "202102", "202103", "202104"),
           labels = c("Jan - 2021", "Feb - 2021", "Mar - 2021", "Apr - 2021"))
    
    # Reshape the dataframe to the "long" format
    df2 <- reshape(df, varying = 2:6, v.names = c("Value"),
            direction = "long")
    
    # Plot the distribution of Counts for each MonthYr
    plot(x = df2$MonthYr, y = df2$Value, xlab = "Month - Year",
         ylab = "Values", main = "Distribution of Counts for each Timepoint")
    

    example_1.png

    For histograms:

    "Jan 2021" <- df2$Value[df2$MonthYr == "Jan - 2021"]
    "Feb 2021" <- df2$Value[df2$MonthYr == "Feb - 2021"]
    "Mar 2021" <- df2$Value[df2$MonthYr == "Mar - 2021"]
    "Apr 2021" <- df2$Value[df2$MonthYr == "Apr - 2021"]
    dev.off()
    par(mfrow = c(2, 2))
    hist(`Jan 2021`, las = 2, xlab = "")
    hist(`Feb 2021`, las = 2, xlab = "")
    hist(`Mar 2021`, las = 2, xlab = "")
    hist(`Apr 2021`, las = 2, xlab = "")
    

    example_2.png

    Edit

    You can plot all 4 histograms on the same plot using e.g.

    breaks <- seq(min(df2$Value), max(df2$Value), 0.5)
    yaxis <- seq(1, 4, length.out = 20)
    plot(x = df2$Value, y = yaxis, type = "n", ylab = "Count", xlab = "Value")
    hist(`Jan 2021`, add = TRUE, breaks = breaks, col = 1, border = 1)
    hist(`Feb 2021`, add = TRUE, breaks = breaks, col = 2, border = 2)
    hist(`Mar 2021`, add = TRUE, breaks = breaks + 0.33, col = 3, border = 3)
    hist(`Apr 2021`, add = TRUE, breaks = breaks + 0.66, col = 4, border = 4)
    legend("topleft", c("Jan", "Feb", "Mar", "Apr"), fill = 1:4)
    

    example_3.png

    However, you can see that they overlap (I used a small offset in the breaks so they don't overlap completely). I think a better way of handling it would be to use ggplot graphics library:

    ggplot(df2, aes(x = Value, fill = MonthYr)) +
      geom_bar() +
      scale_y_continuous(breaks = 1:10)
    

    example_4.png

    Or, stacked side-by-side:

    ggplot(df2, aes(x = Value, fill = MonthYr)) +
      geom_bar(position = position_dodge(preserve = "single"))
    

    example_5.png

    Or facetted:

    ggplot(df2, aes(x = Value, fill = MonthYr)) +
      geom_bar() +
      facet_wrap(~MonthYr)
    

    example_6.png

    EDIT 2:

    Based on your comment below and the picture you have now provided, you don't want a histogram: you want a barchart/barplot. Here is an example of a barplot using the ggplot library

    # Create the dataframe
    df <- data.frame(MonthYr = c(202101, 202102, 202103, 202104),
               Count1 = c(123456, 123425, 123452, 123455),
               Count2 = c(123456, 123429, 123453, 123454),
               Count3 = c(123455, 123428, 123455, 123455),
               Count4 = c(123455, 123428, 123455, 123455),
               Count5 = c(123455, 123428, 123455, 123455))
    
    # Make MonthYr a factor
    df$MonthYr = factor(x = df$MonthYr,
           levels = c("202101", "202102", "202103", "202104"),
           labels = c("Jan - 2021", "Feb - 2021", "Mar - 2021", "Apr - 2021"))
    
    # Reshape the dataframe to the "long" format
    df2 <- reshape(df, varying = 2:6, v.names = c("Value"),
            direction = "long")
    
    df2$Count <- factor(df2$time,
                       levels = c(1, 2, 3, 4, 5),
                       labels = c("Count 1", "Count 2", "Count 3", "Count 4", "Count 5"))
    df2
    
    library(ggplot2)
    library(gridExtra)
    
    plot1 <- ggplot(df2, aes(x = MonthYr, y = Value, fill = Count)) +
      geom_bar(width = 0.5, position = position_dodge(0.7), stat = "identity") +
      coord_cartesian(ylim = c(123405, 123460)) +
      theme_dark(base_size = 16) +
      theme(axis.title = element_blank(),
            legend.position = "bottom")
    
    table_theme <- ttheme_default(base_size = 14, padding = unit(c(8, 8), "mm"))
    table1 <- tableGrob(df, rows = NULL, theme = table_theme)
    grid.arrange(plot1, table1, nrow = 2, heights = c(1, 0.75))
    

    example_7.png