Search code examples
rsparse-matrixdimensionsubmatrix

R: Sparse submatrix without reducing original matrix dimension


From this data frame df

  group   from     to weight
1     1   Joey   Joey      1
2     1   Joey Deedee      1
3     1 Deedee   Joey      1
4     1 Deedee Deedee      1
5     2 Johnny Johnny      1
6     2 Johnny  Tommy      1
7     2  Tommy Johnny      1
8     2  Tommy  Tommy      1

which can be created like this

df <- structure(list(group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), from =
structure(c(2L, 2L, 1L, 1L, 3L, 3L, 4L, 4L), .Label = c("Deedee",
"Joey", "Johnny", "Tommy"), class = "factor"), to = structure(c(2L, 1L,
2L, 1L, 3L, 4L, 3L, 4L), .Label = c("Deedee", "Joey", "Johnny",
"Tommy"), class = "factor"), weight = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("group", "from", "to", "weight"), class = "data.frame",
row.names = c(NA, -8L))

a sparse matrix mat can be obtained using the Matrix package

mat <- sparseMatrix(i = as.numeric(df$from), j = as.numeric(df$to), x =
df$weight, dimnames = list(levels(df$from), levels(df$to)))

which looks like this:

4 x 4 sparse Matrix of class "dgCMatrix"
       Deedee Joey Johnny Tommy
Deedee      1    1      .     .
Joey        1    1      .     .
Johnny      .    .      1     1
Tommy       .    .      1     1

.

How can I create a sparse submatrix using df$group without reducing the original matrix dimension?

The result is supposed to look like this:

4 x 4 sparse Matrix of class "dgCMatrix"
       Deedee Joey Johnny Tommy
Deedee      1    1      .     .
Joey        1    1      .     .
Johnny      .    .      .     .
Tommy       .    .      .     .

First Idea

If I subset the data frame and create the submatrix

df1 <- subset(df, group == 1)
mat1 <- sparseMatrix(i = as.numeric(df1 $from), j = as.numeric(df1 $to),
x = df1 $weight)

the result is a 2 x 2 sparse Matrix. This is not an option. Besides "losing two nodes," I would also have to filter the factor levels to be used as dimension names.

The trick may be to not lose factors when creating the matrix.

Second Idea

If I set df$weight to zero for the group I'm not interested in and create the submatrix

df2 <- df
df2[df2$group == 2, 4] <- 0
mat2 <- sparseMatrix(i = as.numeric(df2$from), j = as.numeric(df2$to), x
= df2$weight, dimnames = list(levels(df$from), levels(df$to)))

the matrix has the right dimension and I can easily carry along the factor levels as dimension names, but the matrix now contains zeros:

4 x 4 sparse Matrix of class "dgCMatrix"
       Deedee Joey Johnny Tommy
Deedee      1    1      .     .
Joey        1    1      .     .
Johnny      .    .      0     0
Tommy       .    .      0     0

This is also not an option because row normalization creates NaNs and I run into trouble when I transform the matrix into a graph and perform network analysis.

Here, the trick may be to remove the zeros from the sparse matrix? But how?

In any case, the solution must be as efficient as possible because the matrices get very large.


Solution

  • Basically your first idea:

    mat1 <- sparseMatrix(i = as.numeric(df1$from), j = as.numeric(df1$to),
                         x = df1$weight, 
                         dims = c(length(levels(df$from)), length(levels(df$to))), 
                         dimnames = list(levels(df$from), levels(df$to)))
    
    #4 x 4 sparse Matrix of class "dgCMatrix"
    #       Deedee Joey Johnny Tommy
    #Deedee      1    1      .     .
    #Joey        1    1      .     .
    #Johnny      .    .      .     .
    #Tommy       .    .      .     .