Search code examples
rsamplephyloseq

How to take random samples from a Phyloseq object


I have a Phyloseq object like the following:

Phyloseq Object

My goal is to take a random sample of size n from this object. Even after trying all sampling functions from the Phyloseq package, I am still unable to complete this task. I would have tried other sampling methods but they do not work with Phyloseq objects. I thought about converting it to a dataframe and then sampling, but I am unsure how to convert it back to the same Phyloseq object as before with just fewer rows.

If anyone has a way to take random samples from a phyloseq object, I would greatly appreciate your insight. Thanks!


Solution

  • You can randomly sample from the vector of sample_names, and then prune the phyloseq object to those samples.

    require("phyloseq")
    
    # Load example data
    data("GlobalPatterns")
    ps <- GlobalPatterns
    
    # Sample from a physeq object with a sampling function.
    #   ps: physeq object to be sampled
    #   FUN: function to use for sampling (default `sample`)
    #   ...: parameters to be passed to FUN, see `help(sample)` for default parameters
    sample_ps <- function(ps, FUN = sample, ...){
      ids <- sample_names(ps)
      sampled_ids <- FUN(ids, ...)
      ps <- prune_samples(sampled_ids, ps)
      return(ps)
    }
    
    sample_ps(ps, size=10)
    #> phyloseq-class experiment-level object
    #> otu_table()   OTU Table:         [ 19216 taxa and 10 samples ]
    #> sample_data() Sample Data:       [ 10 samples by 7 sample variables ]
    #> tax_table()   Taxonomy Table:    [ 19216 taxa by 7 taxonomic ranks ]
    #> phy_tree()    Phylogenetic Tree: [ 19216 tips and 19215 internal nodes ]
    

    Created on 2023-03-08 by the reprex package (v2.0.1)