Search code examples
pythonpython-2.7numpymatrix-multiplicationadjacency-matrix

matrix operations with gigantic matrices in python


Does anybody know how to work with gigantic matrices in python? I have to work with adjacency matrices of shape (10^6,10^6) and perform operations including addition, scaling and dot product. Using numpy arrays I had problems with the ram.


Solution

  • How about something like this...

    import numpy as np
    
    # Create large arrays x and y.
    # Note they are 1e4 not 1e6 b/c of memory issues creating random numpy matrices (CookieOfFortune) 
    # However, the same principles apply to larger arrays
    x = np.random.randn(10000, 10000)
    y = np.random.randn(10000, 10000)
    
    # Create memory maps for x and y arrays
    xmap = np.memmap('xfile.dat', dtype='float32', mode='w+', shape=x.shape)
    ymap = np.memmap('yfile.dat', dtype='float32', mode='w+', shape=y.shape)
    
    # Fill memory maps with data
    xmap[:] = x[:]
    ymap[:] = y[:]
    
    # Create memory map for out of core dot product result
    prodmap = np.memmap('prodfile.dat', dtype='float32', mode='w+', shape=x.shape)
    
    # Due out of core dot product and write data
    prodmap[:] = np.memmap.dot(xmap, ymap)
    
    # Create memory map for out of core addition result
    addmap = np.memmap('addfile.dat', dtype='float32', mode='w+', shape=x.shape)
    
    # Due out of core addition and write data
    addmap[:] = xmap + ymap
    
    # Create memory map for out of core scaling result
    scalemap = np.memmap('scalefile.dat', dtype='float32', mode='w+', shape=x.shape)
    
    # Define scaling constant
    scale = 1.3
    
    # Do out of core  scaling and write data
    scalemap[:] = scale * xmap
    

    This code will create files xfile.dat, yfile.dat, ect that contain the arrays in binary format. To access them later you simply need to do np.memmap(filename). Other arguments to np.memmap are optional, but reccomended (arguments like dtype, shape, ect.).