Search code examples
databasetaxonomyphyloseqqiime

How to add and specify additional ranks to phyloseq taxonomy?


Is it possible to specify the number and names of taxonomic ranks when reading in data to a phyloseq object? When creating a phyloseq object from qiime output, e.g.,

ps=qza_to_phyloseq(features="features.qza", taxonomy="taxonomy.qza", metadata="metadata.qza")

the object defaults to 7 taxonomic levels, i.e., Kingdom, Phylum, Class, Order, Family, Genus, and Species.

I am using the PR2 database which is currently utilizing 9 taxonomic levels, i.e, Domain (replacing Kingdom), Supergroup, Division, Subdivision (new taxonomic rank added), Class, Order, Family, Genus, and Species. (See here for details: https://pr2-database.org/documentation/pr2-taxonomy-9-levels/). This means that the taxonomy does not match the grouping, i.e., Kingdom is actually a Domain, and the genus and species are missing entirely.

Phyloseq is a bit of a black box to me, so I am not sure where I can override the default. Any help would be greatly appreciated.


Solution

  • I figured out where this step is occurring parse_taxonomyis a function of qiime2R that hard codes the 7 taxonomic levels we all learned in grade school.

    Simply edit the function and save it. For The current 9-level PR2 database, your code will look like this:

    trace("parse_taxonomy", edit=TRUE)
    
    function (taxonomy, tax_sep, trim_extra) 
    {
        if (missing(taxonomy)) {
            stop("Taxonomy Table not supplied.")
        }
        if (missing(trim_extra)) {
            trim_extra = TRUE
        }
        if (missing(tax_sep)) {
            tax_sep = "; |;"
        }
        if (sum(colnames(taxonomy) %in% c("Feature.ID", "Taxon")) != 
            2) {
            stop("Table does not match expected format. ie does not have columns Feature.ID and Taxon.")
        }
        taxonomy <- taxonomy[, c("Feature.ID", "Taxon")]
        if (trim_extra) {
            taxonomy$Taxon <- gsub("[kpcofgs]__", "", taxonomy$Taxon)
            taxonomy$Taxon <- gsub("D_\\d__", "", taxonomy$Taxon)
        }
        taxonomy <- suppressWarnings(taxonomy %>% separate(Taxon, 
            c("Domain", "Supergroup", "Division", "Subdivision", 
                "Class", "Order", "Family", "Genus", "Species"), 
            sep = tax_sep))
        taxonomy <- apply(taxonomy, 2, function(x) if_else(x == "", 
            NA_character_, x))
        taxonomy <- as.data.frame(taxonomy)
        rownames(taxonomy) <- taxonomy$Feature.ID
        taxonomy$Feature.ID <- NULL
        return(taxonomy)
    }
    

    Save and close! Note that this will not permanently change the function, only in your current session.