Search code examples
rsparse-matrixheatmapmissing-data

How to generate a filled heatmap from sparse matrix by filling in missing values


The data frame has 3 columns, id, days, and sum. I would like to generate a heatmap of sum with id on the y axis and days on the x axis. The problem is the data is sparse, so the heatmap consists of discrete bars. I would like the bars to extend to the right so the bars are solid and change colors when the sum changes values, and keeps that color until the next day value to the right.

Here's an example to generate the type of plot that I'm making.

library(ggplot2)

set.seed(13)
x_id <- sample( LETTERS[1:5], 100, replace=TRUE, 
                prob=c(0.15, 0.2, 0.35, 0.1, 0.2) )
x_sum <- sample( c(5, 30, 60, 120, 180, 240, 360), 100, replace=TRUE, 
                   prob=c(.1, .1, .2, .2, .2, .1, .1) )
x_days <- sample.int(2000, 100, replace = TRUE)-1000

df <- data.frame(id = x_id, Days = x_days, sum = x_sum)

ggp <- ggplot(data = df, 
       mapping = aes(x = Days, 
                     y = id, 
                     fill = sum)) +
  geom_tile() +
  xlab(label = "Days") + ylab(label = 'id') +
  scale_fill_gradient(low = "blue", high = "red") 
print(ggp)

Example of sparse heatmap

I would like the colors to extend to the right. I believe this means that data frame should be sorted by id and days, and additional rows would have to be added for each id in order to fill in the missing days with value of sum and id equal to the last value of sum / id. But how do I do add the rows for each id and fill in the missing values? The right most color should be extended by a fixed length so the color is more visible, say extend by 30 days.

Also, the color map show indicate a critical value. Say the critical value is 180. Then for sums from zero to the critical value (180), then the colors should go from green (0) to yellow (179), and for values above critical value (180), colors should go from light red (180) to dark red (max value or 360)

Update:

Here is a solution for filling in the sparse matrix

library(tidyr)

setkey(DT, id, Days)
DT_fill_NA <- DT[setkey(DT[, .(min(Days):(max(Days)+30)), by = id], id, V1)]

DT_fill <- fill(DT_fill_NA, c('sum'), .direction = "down")

ggp <- ggplot(data = DT_fill, 
              mapping = aes(x = Days, 
                            y = id, 
                            fill = sum)) +
  geom_tile() +
  xlab(label = "Days") + ylab(label = 'id') +
  scale_fill_gradient(low = "blue", high = "red") 
print(ggp)

This creates the figure with the sparse bars extended to right to the next bar

Sparse Heatmap Filled to the Right

Now the color map should be modified to indicate a critical value. Let the critical value be 180. Then for sums from zero to the critical value (180), then the colors should go from green (0) to yellow (179), and for sums above critical value (180), colors should go from light red (180) to dark red (max value or 360)

2nd Update

One way to generate green with a break at 180 is as follows

ggp <- ggplot(data = DT_fill, 
              mapping = aes(x = Days, 
                            y = id, 
                            fill = sum)) +
  geom_tile() +
  xlab(label = "Days") + ylab(label = 'id') +
  scale_fill_gradient2(low = "green", mid = "indianred2", high = "red2", 
                         midpoint = 180, breaks = c(50, 100, 200, 300)) +
  theme_bw()

print(ggp)

Sparse data extended right highlighting break point

I'm not sure this clearly identifies the break point at a specific value. How can the break between green / red be made right at a critical value (180)?


Solution

  • Here is one approach to generate a filled heat map from a sparse matrix with a critical value highlighted.

    library(ggplot2)
    library(data.table)
    library(tidyr)
    
    set.seed(13)
    n_rows = 200
    x_id <- sample( LETTERS[1:5], n_rows, replace=TRUE, 
                    prob=c(0.15, 0.2, 0.35, 0.1, 0.2) )
    x_sum <- sample(        c(0,  5, 30, 60, 120, 180, 240, 270, 360), n_rows, replace=TRUE, 
                     prob=c(.05, .05, .1, .2, .2,  .2,  .1,  05, .05) )
    x_days <- sample.int(2000, n_rows, replace = TRUE)-1000
    
    DT <- data.table(id = x_id, Days = x_days, sum = x_sum)
    
    setkey(DT, id, Days)
    DT_fill_NA <- DT[setkey(DT[, .(min(Days):(max(Days)+100)), by = id], id, V1)]
    
    DT_fill <- fill(DT_fill_NA, c('sum'), .direction = "down")
    
    
    brks = c(-1, 50, 100, 180, 250, 300, max(DT_fill$sum))
    DT_fill$sum_factors = cut(DT_fill$sum, breaks = brks, ordered_result = TRUE, right = TRUE)
    unique(DT_fill$sum_factors)
    
    ggp <- ggplot(data = DT_fill, 
                  mapping = aes(x = Days, 
                                y = id, 
                                fill = sum_factors)) +
      geom_tile() +
      xlab(label = "Days") + ylab(label = 'id') +
      scale_fill_manual(values = c("green4", "green3", "green", 
                                   "firebrick1", "firebrick3", "firebrick4")) +
      theme_bw()
    
    print(ggp)
    

    enter image description here