Search code examples
rsparse-matrix

Construct a sparse matrix from text in R


I need to construct a 30915r * 31193c matrix in R. I have a csv file which looks like this:

1:1 2:116 3:65 6:1 12:1 10025:1 25091:1 25836:1 31193:1
1:1 2:70 3:50 11:1 12:1 10025:1 23671:1
1:1 2:42 6:1 12:1 10025:1 10378:1 24213:1
1:1 2:105 3:73 11:1 12:1 10025:1 22547:1
...[total 30915 lines]

The line number is the row index. On every line, each number before the colon is the column index. The number after the colon is the value. All other index's values not shown in the text are 0.

How could I convert the csv file to sparse matrix in R?

Thanks for the help!


Solution

  • dd = readLines("your_file.ext")
    
    # should give you something like this:
    dd = c("1:1 2:116 3:65 6:1 12:1 10025:1 25091:1 25836:1 31193:1",
    "1:1 2:70 3:50 11:1 12:1 10025:1 23671:1",
    "1:1 2:42 6:1 12:1 10025:1 10378:1 24213:1",
    "1:1 2:105 3:73 11:1 12:1 10025:1 22547:1")
    
    dd = strsplit(dd, split = " ", fixed = TRUE)
    dd = sapply(dd, function(x) as.integer(unlist(strsplit(x, split = ":", fixed = TRUE))))
    col_val = matrix(unlist(dd), ncol = 2, byrow = T)
    row = rep(seq_along(dd), lengths(dd) / 2)
    M = sparseMatrix(i = row, j = col_val[, 1], x = col_val[, 2])
    M
    # 4 x 31193 sparse Matrix of class "dgCMatrix"
    #                                                                                                            
    # [1,] 1 116 65 . . 1 . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    # [2,] 1  70 50 . . . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    # [3,] 1  42  . . . 1 . . . . . 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    # [4,] 1 105 73 . . . . . . . 1 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
    #            
    # [1,] ......
    # [2,] ......
    # [3,] ......
    # [4,] ......
    # 
    #  .....suppressing columns in show(); maybe adjust 'options(max.print= *, width = *)'
    #  ..............................