Search code examples
rrandomsparse-matrixadjacency-matrix

Creating a sparse matrix in r with a set number of integer values per row


I'm trying to create a sparse matrix, where for each row has a maximum of n entries that are each integers within a certain range, which I could then use as an adjacency matrix for social network analysis. For example, an 80X80 matrix where each row has 10 or fewer entries that are integers from 1-4. The goal is to represent the sort of data you would get from a social networking survey in which respondents were selecting values between 1 and 4 to indicate their relationship with up to 10 of the possibilities/columns in the survey.

I can create a sparse matrix using the "rsparsematrix" function, and using the density command can approximate the required number of responses, but I can't control the number of responses per row and would have to do additional processing to convert the random values to integers within my desired range.

eg: I could start with something like

M1<-rsparsematrix(80, 80, density = .1, symmetric = FALSE)

A more promising approach (from https://www.r-bloggers.com/casting-a-wide-and-sparse-matrix-in-r/) would be to generate the values and then use "transform" to convert them into a matrix. This allows me to control the integer values, but still doesn't get the limited number of responses per row.

Example code from the blog follows:

set.seed(11)

 N = 10
data = data.frame(
row = sample(1:3, N, replace = TRUE),
col = sample(LETTERS, N, replace = TRUE),
value = sample(1:3, N, replace = TRUE))

data = transform(data,
              row = factor(row),
              col = factor(col))  "

This could be tweaked to give the required 80x80 matrix, but doesn't solve the problem of limiting the responses per row and, in the case of duplicate entries in the same row/column combination will result in out of range values since it resolves duplicate entries by taking the sum.

Any suggestions would be most appreciated.

As a bonus question, how would you then create random rows of null responses? For example within the 80*80 matrix, how might you introduce 40 random rows with no values? As in the description above, this would correspond to missing survey data.


Solution

  • You can try to build the spare matrix up using the row (i), column (j) amd value (x) components. This involves sampling subject to your row and value constraints.

    # constraints
    values <- 1:4
    maxValuesPerRow <- 10
    nrow <- 80
    ncol <- 80
    
    # sample values : how many values should each row get but <= 10 values
    set.seed(1)
    nValuesForEachRow <- sample(maxValuesPerRow, nrow, replace=TRUE)
    
    # create matrix
    library(Matrix)
    i <- rep(seq_len(nrow), nValuesForEachRow)                       # row
    j <- unlist(lapply(nValuesForEachRow, sample, x=seq_len(ncol)))  # which columns
    x <- sample(values, sum(nValuesForEachRow), replace=TRUE)        # values
    sm <- sparseMatrix(i=i, j=j, x=x)
    

    check

    dim(sm)
    table(rowSums(sm>0))
    table(as.vector(sm))
    

    note, cant just sample columns like below as this can give duplicate values, hence loop used.

    j <- sample(seq_len(ncol), sum(nValuesForEachRow), replace=TRUE)