Search code examples
rveganphyloseq

Coloring Rarefaction curve lines by metadata (vegan package) (phyloseq package)


First time question asker here. I wasn't able to find an answer to this question in other posts (love stackexchange, btw).

Anyway... I'm creating a rarefaction curve via the vegan package and I'm getting a very messy plot that has a very thick black bar at the bottom of the plot which is obscuring some low diversity sample lines. Ideally, I would like to generate a plot with all of my lines (169; I could reduce this to 144) but make a composite graph, coloring by Sample Year and making different types of lines for each Pond (i.e: 2 sample years: 2016, 2017 and 3 ponds: 1,2,5). I've used phyloseq to create an object with all my data, then separated my OTU abundance table from my metadata into distinct objects (jt = OTU table and sampledata = metadata). My current code:

 jt <- as.data.frame(t(j)) # transform it to make it compatible with the proceeding commands
rarecurve(jt
          , step = 100
          , sample = 6000
          , main = "Alpha Rarefaction Curve"
          , cex = 0.2
          , color = sampledata$PondYear)

# A very small subset of the sample metadata
                  Pond    Year
F16.5.d.1.1.R2     5      2016
F17.1.D.6.1.R1     1      2017
F16.1.D15.1.R3     1      2016
F17.2.D00.1.R2     2      2017

enter image description here


Solution

  • Here is an example of how to plot a rarefaction curve with ggplot. I used data available in the phyloseq package available from bioconductor.

    to install phyloseq:

    source('http://bioconductor.org/biocLite.R')
    biocLite('phyloseq')
    library(phyloseq)
    

    other libraries needed

    library(tidyverse)
    library(vegan)
    

    data:

    mothlist <- system.file("extdata", "esophagus.fn.list.gz", package = "phyloseq")
    mothgroup <- system.file("extdata", "esophagus.good.groups.gz", package = "phyloseq")
    mothtree <- system.file("extdata", "esophagus.tree.gz", package = "phyloseq")
    cutoff <- "0.10"
    esophman <- import_mothur(mothlist, mothgroup, mothtree, cutoff)
    

    extract OTU table, transpose and convert to data frame

    otu <- otu_table(esophman)
    otu <- as.data.frame(t(otu))
    sample_names <- rownames(otu)
    
    out <- rarecurve(otu, step = 5, sample = 6000, label = T)
    

    Now you have a list each element corresponds to one sample:

    Clean the list up a bit:

    rare <- lapply(out, function(x){
      b <- as.data.frame(x)
      b <- data.frame(OTU = b[,1], raw.read = rownames(b))
      b$raw.read <- as.numeric(gsub("N", "",  b$raw.read))
      return(b)
    })
    

    label list

    names(rare) <- sample_names
    

    convert to data frame:

    rare <- map_dfr(rare, function(x){
      z <- data.frame(x)
      return(z)
    }, .id = "sample")
    

    Lets see how it looks:

    head(rare)
      sample       OTU raw.read
    1      B  1.000000        1
    2      B  5.977595        6
    3      B 10.919090       11
    4      B 15.826125       16
    5      B 20.700279       21
    6      B 25.543070       26
    

    plot with ggplot2

    ggplot(data = rare)+
      geom_line(aes(x = raw.read, y = OTU, color = sample))+
      scale_x_continuous(labels =  scales::scientific_format())
    

    enter image description here

    vegan plot:

    rarecurve(otu, step = 5, sample = 6000, label = T) #low step size because of low abundance
    

    enter image description here

    One can make an additional column of groupings and color according to that.

    Here is an example how to add another grouping. Lets assume you have a table of the form:

    groupings <- data.frame(sample = c("B", "C", "D"),
                           location = c("one", "one", "two"), stringsAsFactors = F)
    groupings
      sample location
    1      B      one
    2      C      one
    3      D      two
    

    where samples are grouped according to another feature. You could use lapply or map_dfr to go over groupings$sample and label rare$location.

    rare <- map_dfr(groupings$sample, function(x){ #loop over samples
      z <- rare[rare$sample == x,] #subset rare according to sample 
      loc <- groupings$location[groupings$sample == x] #subset groupings according to sample, if more than one grouping repeat for all
      z <- data.frame(z, loc) #make a new data frame with the subsets
      return(z)
    })
    
    head(rare)
      sample       OTU raw.read loc
    1      B  1.000000        1 one
    2      B  5.977595        6 one
    3      B 10.919090       11 one
    4      B 15.826125       16 one
    5      B 20.700279       21 one
    6      B 25.543070       26 one
    

    Lets make a decent plot out of this

    ggplot(data = rare)+
      geom_line(aes(x = raw.read, y = OTU, group = sample, color = loc))+
      geom_text(data = rare %>% #here we need coordinates of the labels
                  group_by(sample) %>% #first group by samples
                  summarise(max_OTU = max(OTU), #find max OTU
                            max_raw = max(raw.read)), #find max raw read
                  aes(x = max_raw, y = max_OTU, label = sample), check_overlap = T, hjust = 0)+
      scale_x_continuous(labels =  scales::scientific_format())+
      theme_bw()
    

    enter image description here