Search code examples
rggplot2reshape2melt

How to melt and plot multiple datasets over different ranges on the same set of axis?


This is my first time posting here, I hope my question is clear and appropriate. I have a set of data the head of which looks like this:

   wl     ex421  wl     ex309  wl      ex284  wl      ex347
1 431 0.6168224 321 0.1267943 301 0.06392694 361 0.15220484
2 432 0.6687435 322 0.2416268 302 0.05631659 362 0.08961593
3 433 0.6583593 323 0.4665072 303 0.05327245 363 0.13134187
4 434 0.6832814 324 0.3576555 304 0.00000000 364 0.32432432
5 435 0.6427830 325 0.2194976 305 0.12328767 365 0.50308203
6 436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977

and so on. The 'wl' columns represent wavelength, and there are four different ranges. The other four columns represent measurements (normalized) taken over the 'wl' ranges. The ranges are of different lengths, too. All of them partially overlap somewhere in the middle of the dataset. What I need to achieve is a plot showing all four sets of 'ex###' data on the same set of axis and plotted over their respective ranges. The x-axis needs to accommodate all four 'wl' ranges. However, I haven't yet succeeded.

When I had to plot multiple sets of data like this in the past I just melted the data and it always worked. Something like this:

df_melt <- melt(df, id.var = 'wl')

And then I'd plot it like this:

fluor_plt <- ggplot(fluor_ref2_melt, aes(x=wl,y=value,color=variable)) + 
geom_point(shape = 1, fill = NA) + geom_path(data = fluor_ref2_melt,size = 1) +
  theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) + 
  scale_colour_manual(values = colvec)

However, because I have multiple columns with the name 'wl', which also have different ranges, what happens is that R only takes the first 'wl' column and discards all the other ones. It then basically shifts all the 'ex###' values into that range by using the row index... so I get a plot of the frame below:

   wl     ex421    ex309    ex284      ex347
1 431 0.6168224 0.1267943 0.06392694 0.15220484
2 432 0.6687435 0.2416268 0.05631659 0.08961593
3 433 0.6583593 0.4665072 0.05327245 0.13134187
4 434 0.6832814 0.3576555 0.00000000 0.32432432
5 435 0.6427830 0.2194976 0.12328767 0.50308203
6 436 0.7393562 0.1866029 0.08675799 0.34660977

Needless to say, this is entirely wrong... So one way I tried to circumvent the issue is going into Excel and manually moving columns up and down, so that in the dataframe each row corresponds to one 'wl' value, whether there are any measured values associated with it or not. This got rid of the values being 'shifted', but R still discards the 'wl' columns after the first one. Instead of getting an entirely wrong plot, I get a section of the right one. The first set of observations (ex421) is plotted over its entire range; pieces of the other ones are seen where ranges overlap. I've looked at some similar cases which were asked about here in the past, like this - Reshape data frame from wide to long with re-occuring column names in R. But I'm new to R and I don't think I could fully understand the proposed solutions. I didn't succeed in reshaping my data in the way I want it to be reshaped (keeping different 'wl' ranges for different sets) and I had no idea which arguments to give to ggplot afterwards. I've tried using data.table, but then I don't know what to give it for value.name and variable.name. To reiterate, what I want to achieve is what one would get from plotting the four datasets in the spreadsheet by making a single Scatter plot in Excel and adding four different series to it.

Any input would be greatly appreciated!


Solution

  • Here I load a data frame with your data, making sure to allow repeated names with check.names = F, otherwise it would rename the wl columns to be distinct:

    df <- read.table(
      header = T, check.names = F,
      stringsAsFactors = F,
      text = "   wl     ex421  wl     ex309  wl      ex284  wl      ex347
     431 0.6168224 321 0.1267943 301 0.06392694 361 0.15220484
     432 0.6687435 322 0.2416268 302 0.05631659 362 0.08961593
     433 0.6583593 323 0.4665072 303 0.05327245 363 0.13134187
     434 0.6832814 324 0.3576555 304 0.00000000 364 0.32432432
     435 0.6427830 325 0.2194976 305 0.12328767 365 0.50308203
     436 0.7393562 326 0.1866029 306 0.08675799 366 0.34660977")
    

    Then here's a way to reshape, by just stacking subsets of the data. Since there weren't too many column pairs, I thought a semi-manual method would be ok. It preserves the distinct column headers so we can gather those into long form and map to color like in your plot.

    library(tidyverse)
    df2 <- bind_rows(
      df[1:2],
      df[3:4],
      df[5:6],
      df[7:8]
    ) %>%
      gather(variable, value, -wl) %>%
      drop_na()
    
    
    ggplot(df2, aes(x=wl,y=value,color=variable)) + 
      geom_point(shape = 1, fill = NA) + 
      geom_path(size = 1) +
      theme(panel.grid.major = element_blank(), 
            panel.grid.minor = element_blank())
    

    enter image description here