I am doing microbiome data analysis in R using phyloseq package. The first step of this analysis is to import two files, one is .BIOM file (taxonomic information) and another is metadata file (tab delimited .txt).
Both files contain 147 samples, which is listed at the first column (#SampleID), for example- 001, 002, 003…….010, 011, …….147
I can successfully import the BIOM file by following command –
biom_file = "otu_table.biom"
biomot = import_biom(biom_file, parseFunction = parse_taxonomy_greengenes)
But when I am trying to import the metada .txt file by using this formula,
map_file = "map2.txt"
bmsd = import_qiime_sample_data(map_file)
It removes all leading zero from the sample names at #SampleID column. Therefore, I am unable to merge these two files in the subsequent steps of analysis. Could somebody please help me, how can I keep those leading zero in sample name at #SampleID column.
Thank you for your help.
The data structure in .txt input file
import_qiime_sample_dat
is defined as :
import_qiime_sample_dat <- function (mapfilename)
{
QiimeMap <- read.table(file = mapfilename, header = TRUE,
sep = "\t", comment.char = "")
rownames(QiimeMap) <- as.character(QiimeMap[, 1])
return(sample_data(QiimeMap))
}
and as you can see uses read.table, which automatically converts columns containing numbers to integers/numerics, thus removing the leading zeros.
To avoid this, you could specify the desired column classes to be used in the txt -> data.frame
conversion, but unfortunately import_qiime_sample_dat
does not allow that.
Hence you should import of the file manually :
tmpDF <- read.table(file = mapfilename, header = TRUE, sep = "\t",
comment.char = "", colClasses = 'character')
row.names(tmpDF) <- as.character(tmpDF[[1]])
bmsd <- sample_data(tmpDF)