Search code examples
rbioinformaticsbioconductor

Error with BiocParallel. No barcodes files found


I am currently trying to implement a pipeline using SingleCellExperiment and DropletUtils in R. I have been getting an error for while now and I have tired everything to fix it, but nothing works. This is the code I am currently running.

seq_data_martin <- "/home/.../GSE134809_RAW"

samples <- list.files(path = seq_data_martin, pattern = "*matrix.mtx.gz")
samples <- gsub("_matrix.mtx.gz$", "", samples)

sce <- DropletUtils::read10xCounts(
  samples = paste0("seq_data_martin/", samples, "_"),
  sample.names = samples,
  type = "prefix",
  BPPARAM = BiocParallel::MulticoreParam()
)

The error I am getting upon executing 'sce' is:

Error: BiocParallel errors
  31 remote errors, element index: 1, 2, 3, 4, 5, 6, ...
  0 unevaluated and other errors
  first remote error:
Error in .check_for_compressed(barcode.loc, compressed): cannot find 'seq_data_martin/GSM3972009_69_barcodes.tsv' or its gzip-compressed form

I have tried unzipping the barcodes, changing permissions of the directories but the error still persists. The barcode files are correctly located in the path:

/GSE134809_RAW$ ll | head
total 592568
drwxrwxr-x  2 localadmin localadmin     4096 May 23 14:19 ./
drwxr-xr-x 39 localadmin localadmin     4096 May 23 14:16 ../
-rw-rw-r--  1 localadmin localadmin  1155350 May 23 14:19 GSM3972009_69_barcodes.tsv.gz
-rw-rw-r--  1 localadmin localadmin   264786 Jul 24  2019 GSM3972009_69_genes.tsv.gz
-rw-rw-r--  1 localadmin localadmin 12020368 Jul 24  2019 GSM3972009_69_matrix.mtx.gz
-rw-rw-r--  1 localadmin localadmin  1920076 Jul 24  2019 GSM3972010_68_barcodes.tsv.gz
-rw-rw-r--  1 localadmin localadmin   264786 Jul 24  2019 GSM3972010_68_genes.tsv.gz
-rw-rw-r--  1 localadmin localadmin 21788688 Jul 24  2019 GSM3972010_68_matrix.mtx.gz
-rw-rw-r--  1 localadmin localadmin  2280347 Jul 24  2019 GSM3972011_122_barcodes.tsv.gz

Can someone please help me fix this error?


Solution

  • The function assumes simply barcodes.tsv(.gz), genes.tsv(.gz) and matrix.mtx(.gz), without additional pre- or suffixes by default. You can use the prefix argument to let it know about the suffix, e.g. GSM3972009_69_ but then (no problem) you need to run it once per prefix and then maybe cbind the SingleCellExperiment objects together. Or you do it all manually. After all, it is nothing but read.delim(...) for the tsv files, Matrix::readMM() for the mtx files, and then build your SingleCellExperiment manually.

    For discussion see content of cross-post at biostars: https://www.biostars.org/p/9595552/