Search code examples
rggplot2plotheatmapcategorical-data

plot a heatmap for binary categorical variables in R


I have a dataframe which contains many binary categorical variables, and I would like to display a heatmap-like plot for all the observations, only displaying two colors for "yes" and "no" levels. I would then like to sort it so that those observations (ID) with the most "yes" in their row appear on top.

The sample dataset is provided here:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                   var1 = c('yes', 'yes', 'no', 'yes', 'no'),
                   var2 = c('no', 'yes', 'no', 'yes', 'no'),
                   var3 = c('yes', 'no', 'no', 'yes', 'yes'))
df1


  ID var1 var2 var3
1  1  yes   no  yes
2  2  yes  yes   no
3  3   no   no   no
4  4  yes  yes  yes
5  5   no   no  yes

I tried using the heatmap() function but I could not make it work. Can you please help me with that?


Solution

  • You're on the right track with heatmap. Turn the "yes" / "no" columns of your df into a matrix of 0's and 1's and disable some of the defaults such as scaling and ordering.

    mat1 <- 1*(df1[,-1]=="yes")
    
    > mat1
         var1 var2 var3
    [1,]    1    0    1
    [2,]    1    1    0
    [3,]    0    0    0
    [4,]    1    1    1
    [5,]    0    0    1
    
    # You only need this step if you want the IDs to be shown beside the plot
    
    rownames(mat1) <- rownames(df1)
    
    > mat1
      var1 var2 var3
    1    1    0    1
    2    1    1    0
    3    0    0    0
    4    1    1    1
    5    0    0    1
    
    # reorder the matrix by rowSums before plotting
    
    heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA)
    

    heatmap outcome

    You can change the colour scheme by specifying the col parameter like

    heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA, col=c("lightgrey", "tomato"))
    

    If you would prefer the plot to read left-to-right (one column per ID), just transpose the matrix

     heatmap(t(mat1[order(rowSums(mat1)),]), scale = "none", Rowv = NA, Colv = NA)