I'm currently working on creating a heatmap in R using ggplot2. Each column in my dataset represents a different variable, and I want to assign a unique gradient to each unique variable value. However, I'm facing difficulties in achieving this goal.
I have a dataset data containing information about different species and various attributes. Here's a sample subset of my data:
Species Total LSE Ortholog Truncated Pseudogenes
Wirenia_argentea 258 115 143 19 10
Gymnomenia_pellucida 260 96 164 7 3
Epimenia_babai 511 350 161 15 68
Acanthochitona_crinita 220 52 168 10 0
Acanthopleura_granulata 157 31 126 2 9
Mopalia_swanii 527 278 249 31 104
Mopalia_vespertina 491 249 242 13 57
Nautilus_pompilius 411 146 265 14 35
Argonauta_argo 137 5 132 0 4
Octopus_bimaculoides 192 11 181 2 4
Octopus_minor 236 43 193 3 NA
Octopus_sinensis 203 23 180 7 2
Octopus_vulgaris 329 51 278 2 5
Octopus_maya 161 33 128 2 8
Octopus_mimus 78 10 68 1 4
Octopus_insularis 170 43 127 1 10
Octopus_rubescens 197 42 155 5 16
Hapalochlaena_maculosa 172 3 169 0 0
Muusoctopus_leioderma* 152 10 142 1 2
Muusoctopus_longibrachus* 125 5 120 1 0
Japetella_diaphana 56 13 43 1 2
Sepia_pharaonis 162 35 127 0 7
Euprymna_scolopes 282 29 253 2 4
Octopoteuthis_deletron 46 6 40 0 NA
Watasenia_scintillans 52 10 42 3 3
Architeuthis_dux 323 34 289 3 7
Laevipilina_antarctica 164 62 102 6 NA
Gadila_tolmiei 140 53 87 9 1
Alviniconcha_marisindica 378 188 190 5 57
Batillaria_attramentaria 444 163 281 19 15
Melanoides_tuberculata 226 111 115 12 42
Babylonia_areolata 878 645 233 16 74
Conus_betulinus 1210 1071 139 265 773
Conus_consors 1560 1226 334 200 418
I've melted this data into a long format for ggplot2, resulting in a data frame melted_data with columns Species, variable, and value. For example:
Species variable value
Wirenia argentea Total 258
Gymnomenia pellucida Total 260
Epimenia babai Total 511
Acanthochitona crinita Total 220
Acanthopleura granulata Total 157
Mopalia swanii Total 527
Mopalia vespertina Total 491
Laevipilina antarctica Total 164
Gadila tolmiei Total 140
Wirenia argentea LSE 115
Gymnomenia pellucida LSE 96
Epimenia babai LSE 350
Acanthochitona crinita LSE 52
Acanthopleura granulata LSE 31
Mopalia swanii LSE 278
Mopalia vespertina LSE 249
Laevipilina antarctica LSE 62
Gadila tolmiei LSE 53
Wirenia argentea Ortholog 143
Gymnomenia pellucida Ortholog 164
Epimenia babai Ortholog 161
Acanthochitona crinita Ortholog 168
Acanthopleura granulata Ortholog 126
Mopalia swanii Ortholog 249
Mopalia vespertina Ortholog 242
Laevipilina antarctica Ortholog 102
Gadila tolmiei Ortholog 87
Wirenia argentea Truncated 19
Gymnomenia pellucida Truncated 7
Epimenia babai Truncated 15
Acanthochitona crinita Truncated 10
Acanthopleura granulata Truncated 2
Mopalia swanii Truncated 31
Mopalia vespertina Truncated 13
Laevipilina antarctica Truncated 6
Gadila tolmiei Truncated 9
Wirenia argentea Pseudogenes 10
Gymnomenia pellucida Pseudogenes 3
Epimenia babai Pseudogenes 68
Acanthochitona crinita Pseudogenes 0
Acanthopleura granulata Pseudogenes 9
Mopalia swanii Pseudogenes 104
Mopalia vespertina Pseudogenes 57
Laevipilina antarctica Pseudogenes NA
Gadila tolmiei Pseudogenes 1
I want to create a heatmap where the y-axis will contain the Species name, and the x-axis will contain the variable values. Each column (variable) must have its unique gradient. For example, I want the gradient for the Total column to be different from that for the LSE column, and so on (Otherwise, the Total column of the heatmap will always have the highest value).
I've attempted to create the heatmap using the following code:
# Melt the data frame to long format for ggplot
melted_data <- melt(data, id.vars = "Species")
# Remove underscores from species names
melted_data$Species <- gsub("_", " ", melted_data$Species)
# Define the breakpoints and corresponding colors
breaks <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
colors <- c("#e9eca7", "#c9f6c7", "#a8ecd2", "#92dfdc", "#8bd0df", "#5b8dce", "#4575b4", "#fca562", "#fc8d59")
# Plot the heatmap
heatmap_plot <- ggplot(melted_data, aes(x = variable, y = Species, fill = value)) +
geom_tile(color = "white") +
geom_text(aes(label = round(value, 2)), color = "black") +
scale_fill_gradientn(colors = colors, na.value = "white") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_x_discrete() +
coord_fixed(ratio = 1)
# Print the plot
print(heatmap_plot)
However, this code assigns the same gradient to all columns. I'm looking for a way to assign a unique gradient to each column.
I want to create a heatmap in which each column has a unique gradient, which will allow for better visualization of the data distribution across variables.
I'm seeking guidance on modifying my code to achieve the desired outcome. Specifically, I need help assigning separate gradients for each column in the heatmap. Any suggestions or insights would be greatly appreciated.
Thank you!
One option would be to use the ggnewscale
package which allows for multiple scales for the same aesthetic. As this requires to add the tiles for each column via a separate geom_tile
I use purrr::imap
to loop over the categories of variable to add the layers for each column.
In the code below I simply used your color gradient, which however is now applied individually to each column. But of course is it possible to adapt the code to use unique colors for each column.
library(ggplot2)
library(ggnewscale)
ggplot(melted_data, aes(x = variable, y = Species, fill = value)) +
purrr::imap(
split(melted_data, ~variable),
\(x, y) {
list(
ggnewscale::new_scale_fill(),
geom_tile(data = x, aes(fill = value)),
scale_fill_gradientn(colors = colors, na.value = "white", name = y, guide = "none")
)
}
) +
geom_text(aes(label = round(value, 2)), color = "black") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1)
) +
coord_fixed(ratio = 1)