Currently having a bit of trouble formatting a heatmap for some gene expression data. I would like to have the labels of the samples instead of just sample 1, sample 2, etc. Also, would like the heatmap to use a random sample for 50 different genes instead of just the first 50. Code and data set is linked below.
library(dplyr)
library(tidyr)
library(ggplot2)
set.seed(1234)
nci <- read.csv('/Users/KyleHammerberg/Desktop/ML Extra Credit/nci.datanames.csv')
# create matrix
mat <- matrix(rexp(3200, rate=1), ncol=64)
rownames(mat) <- paste0('gene',1:nrow(mat))
colnames(mat) <- paste0('sample',1:ncol(mat))
mat[1:10,1:10]
# convert to data.frame and gather
mat <- as.data.frame(mat)
mat$gene <- rownames(mat)
mat <- mat %>% gather(key='sample', value='value', -gene)
ggplot(mat, aes(sample, gene)) + geom_tile(aes(fill=value)) + theme(axis.text.x = element_text(angle=90, size = 4),
axis.text.y = element_text(size = 4))
data: https://github.com/khammerberg53/MLEC/blob/main/nci.datanames.csv
Perhaps this will help:
library(tidyverse)
nci %>%
rename(gene = X) %>%
pivot_longer(-gene, names_to = "sample", values_to = "value") %>%
filter(gene %in% sample(unique(gene),50)) %>%
ggplot(aes(x = sample, y = gene, fill = value)) +
geom_tile() +
theme(axis.text.x = element_text(angle=90, size = 4),
axis.text.y = element_text(size = 4))
Data
nci <- read.csv("https://raw.githubusercontent.com/khammerberg53/MLEC/main/nci.datanames.csv")