I have a data frame like following (the rownames are "1", "2", "3"...). Since there are non unique entries in each column, I cannot assign any of them as row names.
gene cell count
a c1 1
a c2 1
a c3 4
b c1 3
b c2 1
b c3 1
f c1 3
d c8 9
e c11 1
Each gene is measured in each cell (means they have a value in count column) but zero counts are not shown (for example gene "a" has zero counts in cells c8 and c11, hence do not appear).
Now I want to reshape/convert the data frame into dgCMatrix with following arrangement
(genes as row names, cells as column names and count values as data points)
c1 c2 c3 c8 c11
a 1 1 4 . .
c 3 1 1 . .
where "." corresponds to a zero count.
I tried reshape, reshape2, as.matrix as mentioned in many posts here, but no success.
You convert to long format and set the gene column as rownames first:
library(Matrix)
library(dplyr)
library(tidyr)
mat <- df %>% pivot_wider(id_cols = gene,values_from = count,names_from = cell,
values_fill = list(count=0)) %>% tibble::column_to_rownames("gene")
Then to sparseMatrix:
mat = Matrix(as.matrix(mat),sparse=TRUE)
5 x 5 sparse Matrix of class "dgCMatrix"
c1 c2 c3 c8 c11
a 1 1 4 . .
b 3 1 1 . .
f 3 . . . .
d . . . 9 .
e . . . . 1