Search code examples
pythonnumpymachine-learningkagglematrix-factorization

numpy.core._exceptions.MemoryError: Unable to allocate space for array


error

numpy.core._exceptions.MemoryError: Unable to allocate 362. GiB for an array with shape (2700000, 18000) and data type float64

https://www.kaggle.com/datasets/netflix-inc/netflix-prize-data

im working on this netflix prize data set which has a lot of movies and user ids my work is to apply matrix factorization so i need to create a matrix of 2700000 X 18000 which stores int in range 1 to 5 I tried many ways but still unable to create a matrix of that size tried forcing it to be uint8 but the shape of the matrix which im getting is wrong please help me solve this


Solution

  • Your 3 million by 20000 matrix better be sparse or you will need a computer with a very large amount of memory. One copy of a full real matrix that size will require a few hundreds GB or even a few TB of contiguous space.

    1. Exploit more efficient matrix representation, like sparse one scipy.sparse.csc_matrix. The question is if the matrix has most of 0 scores.
    2. Modify your algorithm to work on submatrices.