Search code examples
rggplot2heatmap

Creating a Heatmap with Unique Gradient for Each Column in R


I'm currently working on creating a heatmap in R using ggplot2. Each column in my dataset represents a different variable, and I want to assign a unique gradient to each unique variable value. However, I'm facing difficulties in achieving this goal.

I have a dataset data containing information about different species and various attributes. Here's a sample subset of my data:

Species                     Total   LSE     Ortholog    Truncated   Pseudogenes
Wirenia_argentea           258     115     143         19          10
Gymnomenia_pellucida       260     96      164         7           3
Epimenia_babai             511     350     161         15          68
Acanthochitona_crinita     220     52      168         10          0
Acanthopleura_granulata     157     31     126          2           9
Mopalia_swanii             527     278     249         31          104
Mopalia_vespertina         491     249     242         13          57
Nautilus_pompilius         411     146     265         14          35
Argonauta_argo             137      5      132          0           4
Octopus_bimaculoides       192      11     181          2           4
Octopus_minor              236      43     193          3          NA
Octopus_sinensis           203      23     180          7           2
Octopus_vulgaris           329      51     278          2           5
Octopus_maya               161      33     128          2           8
Octopus_mimus               78      10      68          1           4
Octopus_insularis          170      43     127          1          10
Octopus_rubescens          197      42     155          5          16
Hapalochlaena_maculosa     172       3     169          0           0
Muusoctopus_leioderma*     152      10     142          1           2
Muusoctopus_longibrachus*  125       5     120          1           0
Japetella_diaphana          56      13      43          1           2
Sepia_pharaonis            162      35     127          0           7
Euprymna_scolopes          282      29     253          2           4
Octopoteuthis_deletron       46       6      40          0          NA
Watasenia_scintillans       52      10      42          3           3
Architeuthis_dux           323      34     289          3           7
Laevipilina_antarctica     164      62     102          6          NA
Gadila_tolmiei             140      53      87          9           1
Alviniconcha_marisindica   378     188     190          5          57
Batillaria_attramentaria   444     163     281         19          15
Melanoides_tuberculata     226     111     115         12          42
Babylonia_areolata         878     645     233         16          74
Conus_betulinus           1210    1071     139        265         773
Conus_consors             1560    1226     334        200         418

I've melted this data into a long format for ggplot2, resulting in a data frame melted_data with columns Species, variable, and value. For example:

Species                     variable    value
Wirenia argentea           Total       258
Gymnomenia pellucida       Total       260
Epimenia babai             Total       511
Acanthochitona crinita     Total       220
Acanthopleura granulata    Total       157
Mopalia swanii             Total       527
Mopalia vespertina         Total       491
Laevipilina antarctica     Total       164
Gadila tolmiei             Total       140
Wirenia argentea           LSE         115
Gymnomenia pellucida       LSE         96
Epimenia babai             LSE         350
Acanthochitona crinita     LSE         52
Acanthopleura granulata    LSE         31
Mopalia swanii             LSE         278
Mopalia vespertina         LSE         249
Laevipilina antarctica     LSE         62
Gadila tolmiei             LSE         53
Wirenia argentea           Ortholog    143
Gymnomenia pellucida       Ortholog    164
Epimenia babai             Ortholog    161
Acanthochitona crinita     Ortholog    168
Acanthopleura granulata    Ortholog    126
Mopalia swanii             Ortholog    249
Mopalia vespertina         Ortholog    242
Laevipilina antarctica     Ortholog    102
Gadila tolmiei             Ortholog    87
Wirenia argentea           Truncated   19
Gymnomenia pellucida       Truncated   7
Epimenia babai             Truncated   15
Acanthochitona crinita     Truncated   10
Acanthopleura granulata    Truncated   2
Mopalia swanii             Truncated   31
Mopalia vespertina         Truncated   13
Laevipilina antarctica     Truncated   6
Gadila tolmiei             Truncated   9
Wirenia argentea           Pseudogenes 10
Gymnomenia pellucida       Pseudogenes 3
Epimenia babai             Pseudogenes 68
Acanthochitona crinita     Pseudogenes 0
Acanthopleura granulata    Pseudogenes 9
Mopalia swanii             Pseudogenes 104
Mopalia vespertina         Pseudogenes 57
Laevipilina antarctica     Pseudogenes NA
Gadila tolmiei             Pseudogenes 1

I want to create a heatmap where the y-axis will contain the Species name, and the x-axis will contain the variable values. Each column (variable) must have its unique gradient. For example, I want the gradient for the Total column to be different from that for the LSE column, and so on (Otherwise, the Total column of the heatmap will always have the highest value).

I've attempted to create the heatmap using the following code:

# Melt the data frame to long format for ggplot
melted_data <- melt(data, id.vars = "Species")

# Remove underscores from species names
melted_data$Species <- gsub("_", " ", melted_data$Species)

# Define the breakpoints and corresponding colors
breaks <- c(0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1)
colors <- c("#e9eca7", "#c9f6c7", "#a8ecd2", "#92dfdc", "#8bd0df", "#5b8dce", "#4575b4", "#fca562", "#fc8d59")

# Plot the heatmap
heatmap_plot <- ggplot(melted_data, aes(x = variable, y = Species, fill = value)) +
  geom_tile(color = "white") +
  geom_text(aes(label = round(value, 2)), color = "black") +
  scale_fill_gradientn(colors = colors, na.value = "white") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
  scale_x_discrete() +
  coord_fixed(ratio = 1)

# Print the plot
print(heatmap_plot)

However, this code assigns the same gradient to all columns. I'm looking for a way to assign a unique gradient to each column.

I want to create a heatmap in which each column has a unique gradient, which will allow for better visualization of the data distribution across variables.

I'm seeking guidance on modifying my code to achieve the desired outcome. Specifically, I need help assigning separate gradients for each column in the heatmap. Any suggestions or insights would be greatly appreciated.

Thank you!


Solution

  • One option would be to use the ggnewscale package which allows for multiple scales for the same aesthetic. As this requires to add the tiles for each column via a separate geom_tile I use purrr::imap to loop over the categories of variable to add the layers for each column.

    In the code below I simply used your color gradient, which however is now applied individually to each column. But of course is it possible to adapt the code to use unique colors for each column.

    library(ggplot2)
    library(ggnewscale)
    
    ggplot(melted_data, aes(x = variable, y = Species, fill = value)) +
      purrr::imap(
        split(melted_data, ~variable),
        \(x, y) {
          list(
            ggnewscale::new_scale_fill(),
            geom_tile(data = x, aes(fill = value)),
            scale_fill_gradientn(colors = colors, na.value = "white", name = y, guide = "none")
          )
        }
      ) +
      geom_text(aes(label = round(value, 2)), color = "black") +
      theme_minimal() +
      theme(
        axis.text.x = element_text(angle = 45, hjust = 1)
      ) +
      coord_fixed(ratio = 1)
    

    enter image description here