From this data frame df
group from to weight
1 1 Joey Joey 1
2 1 Joey Deedee 1
3 1 Deedee Joey 1
4 1 Deedee Deedee 1
5 2 Johnny Johnny 1
6 2 Johnny Tommy 1
7 2 Tommy Johnny 1
8 2 Tommy Tommy 1
which can be created like this
df <- structure(list(group = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), from =
structure(c(2L, 2L, 1L, 1L, 3L, 3L, 4L, 4L), .Label = c("Deedee",
"Joey", "Johnny", "Tommy"), class = "factor"), to = structure(c(2L, 1L,
2L, 1L, 3L, 4L, 3L, 4L), .Label = c("Deedee", "Joey", "Johnny",
"Tommy"), class = "factor"), weight = c(1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L)), .Names = c("group", "from", "to", "weight"), class = "data.frame",
row.names = c(NA, -8L))
a sparse matrix mat
can be obtained using the Matrix package
mat <- sparseMatrix(i = as.numeric(df$from), j = as.numeric(df$to), x =
df$weight, dimnames = list(levels(df$from), levels(df$to)))
which looks like this:
4 x 4 sparse Matrix of class "dgCMatrix"
Deedee Joey Johnny Tommy
Deedee 1 1 . .
Joey 1 1 . .
Johnny . . 1 1
Tommy . . 1 1
.
How can I create a sparse submatrix using df$group
without reducing the original matrix dimension?
The result is supposed to look like this:
4 x 4 sparse Matrix of class "dgCMatrix"
Deedee Joey Johnny Tommy
Deedee 1 1 . .
Joey 1 1 . .
Johnny . . . .
Tommy . . . .
First Idea
If I subset the data frame and create the submatrix
df1 <- subset(df, group == 1)
mat1 <- sparseMatrix(i = as.numeric(df1 $from), j = as.numeric(df1 $to),
x = df1 $weight)
the result is a 2 x 2 sparse Matrix. This is not an option. Besides "losing two nodes," I would also have to filter the factor levels to be used as dimension names.
The trick may be to not lose factors when creating the matrix.
Second Idea
If I set df$weight
to zero for the group I'm not interested in and create the submatrix
df2 <- df
df2[df2$group == 2, 4] <- 0
mat2 <- sparseMatrix(i = as.numeric(df2$from), j = as.numeric(df2$to), x
= df2$weight, dimnames = list(levels(df$from), levels(df$to)))
the matrix has the right dimension and I can easily carry along the factor levels as dimension names, but the matrix now contains zeros:
4 x 4 sparse Matrix of class "dgCMatrix"
Deedee Joey Johnny Tommy
Deedee 1 1 . .
Joey 1 1 . .
Johnny . . 0 0
Tommy . . 0 0
This is also not an option because row normalization creates NaN
s and I run into trouble when I transform the matrix into a graph and perform network analysis.
Here, the trick may be to remove the zeros from the sparse matrix? But how?
In any case, the solution must be as efficient as possible because the matrices get very large.
Basically your first idea:
mat1 <- sparseMatrix(i = as.numeric(df1$from), j = as.numeric(df1$to),
x = df1$weight,
dims = c(length(levels(df$from)), length(levels(df$to))),
dimnames = list(levels(df$from), levels(df$to)))
#4 x 4 sparse Matrix of class "dgCMatrix"
# Deedee Joey Johnny Tommy
#Deedee 1 1 . .
#Joey 1 1 . .
#Johnny . . . .
#Tommy . . . .