I downloaded a raw data set from GSE (https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE92332) which contains single cell analysis data. There are three different file formats matrix.mtx.gz, barcodes.tsv.gz and genes.tsv.tz
I now tried to run this code in order to load the data:
#Load data
data_file = "/Users/---/desktop/single-cell-tutorial/latest_notebook/GSE92332_RAW"
adata = sc.read(data_file, cache=True)
adata = adata.transpose()
adata.X = adata.X.toarray()
But I always get the following value error
ValueError: Reading with filekey '/Users/---/desktop/single-cell-tutorial/latest_notebook/GSE92332_RAW/MTX/mtx.gv' failed, the inferred filename PosixPath('/Users/---/desktop/single-cell-tutorial/latest_notebook/GSE92332_RAW/MTX/mtx.gv.h5ad') does not exist. If you intended to provide a filename, either use a filename ending on one of the available extensions {'csv', 'data', 'tab', 'h5ad', 'anndata', 'h5', 'tsv', 'xlsx', 'loom', 'txt', 'mtx.gz', 'soft.gz', 'mtx'} or pass the parameter
ext
.
I understand that I need to add an extension but regardless of whichever extension I add I still get the same error.
I tried all different extensions that are also file types (mtx.gz etc.), made an own folder with only the MTX data and tried calling that but nothing is working.
The scanpy.read
method is for .h5ad
files. If loading raw CellRanger MTX, then you should use the scanpy.read_10x_mtx
method. E.g.,
import scanpy as sc
data_file = "path/to/GSE92332_RAW"
adata = sc.read_10_mtx(data_file, cache=True)
As commented, the .mtx and .tsv files likely need to be unzipped (run gzip -d *.gz
from command line while in the folder). This is idiosyncratic to scanpy
, which requires data with genes.tsv
(pre-v3 CellRanger output) to be unzipped, whereas data with features.tsv
(v3+ CellRanger output) can stay zipped. At least that's what the code shows.
Since this appears to be many runs, you may also need the prefix
argument to specify which particular run you want to load.