I have a dataframe which contains many binary categorical variables, and I would like to display a heatmap-like plot for all the observations, only displaying two colors for "yes" and "no" levels. I would then like to sort it so that those observations (ID) with the most "yes" in their row appear on top.
The sample dataset is provided here:
df1 <- data.frame(ID = c(1, 2, 3, 4, 5),
var1 = c('yes', 'yes', 'no', 'yes', 'no'),
var2 = c('no', 'yes', 'no', 'yes', 'no'),
var3 = c('yes', 'no', 'no', 'yes', 'yes'))
df1
ID var1 var2 var3
1 1 yes no yes
2 2 yes yes no
3 3 no no no
4 4 yes yes yes
5 5 no no yes
I tried using the heatmap()
function but I could not make it work. Can you please help me with that?
You're on the right track with heatmap
. Turn the "yes" / "no" columns of your df into a matrix of 0's and 1's and disable some of the defaults such as scaling and ordering.
mat1 <- 1*(df1[,-1]=="yes")
> mat1
var1 var2 var3
[1,] 1 0 1
[2,] 1 1 0
[3,] 0 0 0
[4,] 1 1 1
[5,] 0 0 1
# You only need this step if you want the IDs to be shown beside the plot
rownames(mat1) <- rownames(df1)
> mat1
var1 var2 var3
1 1 0 1
2 1 1 0
3 0 0 0
4 1 1 1
5 0 0 1
# reorder the matrix by rowSums before plotting
heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA)
You can change the colour scheme by specifying the col
parameter like
heatmap(mat1[order(rowSums(mat1)),], scale = "none", Rowv = NA, Colv = NA, col=c("lightgrey", "tomato"))
If you would prefer the plot to read left-to-right (one column per ID), just transpose the matrix
heatmap(t(mat1[order(rowSums(mat1)),]), scale = "none", Rowv = NA, Colv = NA)