Search code examples
rmachine-learningsparse-matrix

Create sparse matrix from data frame


I’m trying to create a sparse data matrix from a data frame without having to build a dense matrix which causes serious memory issues .

I found a SO the following post where a solution seems to be found: Create Sparse Matrix from a data frame

I've tried this solution, but, it doesn't work for me, perhaps because my UserID and MovieID doesn't t start in 1.

Here is my sample code:

library(Matrix)

UserID<-c(10090,10090,10090,10316,10316)
MovieID <-c(63155,63530,63544,63155,63545)
Rating <-c(2,2,1,2,1)
trainingData<-data.frame(UserID,MovieID,Rating)
trainingData

UIMatrix <- sparseMatrix(i = trainingData$UserID,
                         j = trainingData$MovieID,
                         x = trainingData$Rating)

dim(UIMatrix)

I expected to get a 2 x 3 matrix but the dims corresponds to the maximum user and movie id.

I've tryed the second solutions suggested in the post but it doesn't with may data work as well.

Can anyone give some advise?


Solution

  • You can convert your indices to indices starting at one with as.integer(as.factor(.)).

    UIMatrix <- sparseMatrix(i = as.integer(as.factor(trainingData$UserID)),
                             j = as.integer(as.factor(trainingData$MovieID)),
                             x = trainingData$Rating)
    
    dim(UIMatrix)
    # [1] 2 4
    
    dimnames(UIMatrix) <- list(sort(unique(trainingData$UserID)),
                               sort(unique(trainingData$MovieID)))
    
    UIMatrix
    # 2 x 4 sparse Matrix of class "dgCMatrix"
    #       63155 63530 63544 63545
    # 10090     2     2     1     .
    # 10316     2     .     .     1