Is it possible to specify the number and names of taxonomic ranks when reading in data to a phyloseq object? When creating a phyloseq object from qiime output, e.g.,
ps=qza_to_phyloseq(features="features.qza", taxonomy="taxonomy.qza", metadata="metadata.qza")
the object defaults to 7 taxonomic levels, i.e., Kingdom, Phylum, Class, Order, Family, Genus, and Species.
I am using the PR2 database which is currently utilizing 9 taxonomic levels, i.e, Domain (replacing Kingdom), Supergroup, Division, Subdivision (new taxonomic rank added), Class, Order, Family, Genus, and Species. (See here for details: https://pr2-database.org/documentation/pr2-taxonomy-9-levels/). This means that the taxonomy does not match the grouping, i.e., Kingdom is actually a Domain, and the genus and species are missing entirely.
Phyloseq is a bit of a black box to me, so I am not sure where I can override the default. Any help would be greatly appreciated.
I figured out where this step is occurring parse_taxonomy
is a function of qiime2R
that hard codes the 7 taxonomic levels we all learned in grade school.
Simply edit the function and save it. For The current 9-level PR2 database, your code will look like this:
trace("parse_taxonomy", edit=TRUE)
function (taxonomy, tax_sep, trim_extra)
{
if (missing(taxonomy)) {
stop("Taxonomy Table not supplied.")
}
if (missing(trim_extra)) {
trim_extra = TRUE
}
if (missing(tax_sep)) {
tax_sep = "; |;"
}
if (sum(colnames(taxonomy) %in% c("Feature.ID", "Taxon")) !=
2) {
stop("Table does not match expected format. ie does not have columns Feature.ID and Taxon.")
}
taxonomy <- taxonomy[, c("Feature.ID", "Taxon")]
if (trim_extra) {
taxonomy$Taxon <- gsub("[kpcofgs]__", "", taxonomy$Taxon)
taxonomy$Taxon <- gsub("D_\\d__", "", taxonomy$Taxon)
}
taxonomy <- suppressWarnings(taxonomy %>% separate(Taxon,
c("Domain", "Supergroup", "Division", "Subdivision",
"Class", "Order", "Family", "Genus", "Species"),
sep = tax_sep))
taxonomy <- apply(taxonomy, 2, function(x) if_else(x == "",
NA_character_, x))
taxonomy <- as.data.frame(taxonomy)
rownames(taxonomy) <- taxonomy$Feature.ID
taxonomy$Feature.ID <- NULL
return(taxonomy)
}
Save and close! Note that this will not permanently change the function, only in your current session.