Search code examples
rchartsggplot2data-visualizationtrending

Replicating a trending chart with ggplot


I recently saw a chart I want to replicate in R. The chart shows a score or other measurement for multiple records as a colored box, binned into one of, say, 4 colors. In my image it is red, light red, light green, and green. So each record gets one box for each score they have - the idea is that each record had one score for a given point in time over several points in time. In my example, I'll use student test scores over time, so say we have 4 students and 8 tests throughout the year (in chronological order) we would have 8 boxes for each student, resulting in 32 boxes. Each row (student) would have 8 boxes.

hacked-up attempt of the chart

Here is how I created some example data:

totallynotrealdata <- data.frame(Student = c(rep("A",8),rep("B",8),rep("C",8),rep("D",8)),Test = rep(1:8,4), Score = sample(1:99,32,replace = TRUE), BinnedScore = cut(totallynotrealdata$TB,breaks = c(0,25,50,75,100),labels = c(1,2,3,4)))

What I'm wondering is how I can recreate this chart in ggplot? Any geoms I should look at?


Solution

  • You could play with geom_rect(). This is very basic but I guess you can easily optimize it for your purposes:

    df <- data.frame(Student = c(rep(1,8),rep(2,8),rep(3,8),rep(4,8)),
                     Test = rep(1:8,4),
                     Score = sample(1:99,32,replace = TRUE)) 
    
    df$BinnedScore <- cut(df$Score,breaks = c(0,25,50,75,100),labels = c(1,2,3,4))
    df$Student     <- factor(df$Student, labels = LETTERS[1:length(unique(df$Student))])
    
    library(ggplot2)
    
    colors   <- c("#f23d2e", "#e39e9c", "#bbd3a8", "#68f200")    
    numStuds <- length(levels(df$Student))
    numTests <- max(df$Test)
    
    ggplot() + geom_rect(data = df, aes(xmin = Test-1, xmax = Test, ymin = as.numeric(Student)-1, ymax = as.numeric(Student)), fill = colors[df$BinnedScore], col = grey(0.5)) +
      xlab("Test") + ylab("Student") +
      scale_y_continuous(breaks = seq(0.5, numStuds, 1), labels = levels(df$Student)) +
      scale_x_continuous(breaks = seq(0.5, numTests, 1), labels = 1:numTests)
    

    enter image description here