plot a heatmap for binary categorical variables in R

I have a dataframe which contains many binary categorical variables, and I would like to display a heatmap-like plot for all the observations, only displaying two colors for "yes" and "no" levels. I would then like to sort it so that those observations (ID) with the most "yes" in their row appear on top.

The sample dataset is provided here:

df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
                   var1 = c('yes', 'yes', 'no', 'yes', 'no'),
                   var2 = c('no', 'yes', 'no', 'yes', 'no'),
                   var3 = c('yes', 'no', 'no', 'yes', 'yes'))
df1


  ID var1 var2 var3
1  1  yes   no  yes
2  2  yes  yes   no
3  3   no   no   no
4  4  yes  yes  yes
5  5   no   no  yes

I tried using the heatmap() function but I could not make it work. Can you please help me with that?

Solution

You're on the right track with heatmap. Turn the "yes" / "no" columns of your df into a matrix of 0's and 1's and disable some of the defaults such as scaling and ordering.

mat1 <- 1*(df1[,-1]=="yes")

> mat1
     var1 var2 var3
[1,]    1    0    1
[2,]    1    1    0
[3,]    0    0    0
[4,]    1    1    1
[5,]    0    0    1

# You only need this step if you want the IDs to be shown beside the plot

rownames(mat1) <- rownames(df1)

> mat1
  var1 var2 var3
1    1    0    1
2    1    1    0
3    0    0    0
4    1    1    1
5    0    0    1

# reorder the matrix by rowSums before plotting

heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA)

You can change the colour scheme by specifying the col parameter like

heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA, col=c("lightgrey", "tomato"))

If you would prefer the plot to read left-to-right (one column per ID), just transpose the matrix

 heatmap(t(mat1[order(rowSums(mat1)),]), scale = "none", Rowv = NA, Colv = NA)