Search code examples
pythonscipysparse-matrixshared-memorysymmetric

Easy way to store SciPy sparse symmetric matrix in a local file, load it into shared memory (shm), and reconstruct it from shm?


Easy way to store the upper diagonal (including the diagonal) of a SciPy sparse matrix in a local file, load the file into shared memory (shm), and reconstruct the sparse matrix from the shm?

ChatGPT 4o gave what seems like an overly complex solution, with three parts: data, indices, and indptr; is that necessary?

import numpy as np
import scipy.sparse as sp
from multiprocessing import shared_memory

# Example matrix creation
matrix = sp.random(100, 100, density=0.1, format='csr')

# Step 1: Extract and save the upper triangular part (including the diagonal)
upper_triangular_matrix = sp.triu(matrix)
sp.save_npz('upper_triangular_matrix.npz', upper_triangular_matrix)

# Step 2: Load the matrix from the file
loaded_matrix = sp.load_npz('upper_triangular_matrix.npz')

# Convert the matrix to shared memory
# Flatten the data, indices, and indptr arrays for shared memory storage
data = loaded_matrix.data
indices = loaded_matrix.indices
indptr = loaded_matrix.indptr

# Create shared memory blocks for each array
shm_data = shared_memory.SharedMemory(create=True, size=data.nbytes)
shm_indices = shared_memory.SharedMemory(create=True, size=indices.nbytes)
shm_indptr = shared_memory.SharedMemory(create=True, size=indptr.nbytes)

# Copy the arrays into the shared memory blocks
np.copyto(np.ndarray(data.shape, dtype=data.dtype, buffer=shm_data.buf), data)
np.copyto(np.ndarray(indices.shape, dtype=indices.dtype, buffer=shm_indices.buf), indices)
np.copyto(np.ndarray(indptr.shape, dtype=indptr.dtype, buffer=shm_indptr.buf), indptr)

# Access the shared memory blocks
shared_data = np.ndarray(data.shape, dtype=data.dtype, buffer=shm_data.buf)
shared_indices = np.ndarray(indices.shape, dtype=indices.dtype, buffer=shm_indices.buf)
shared_indptr = np.ndarray(indptr.shape, dtype=indptr.dtype, buffer=shm_indptr.buf)

# Create a shared memory CSR matrix
shared_csr_matrix = sp.csr_matrix((shared_data, shared_indices, shared_indptr), shape=loaded_matrix.shape)

# Accessing the shared memory matrix
print(shared_csr_matrix)

# Clean up shared memory
shm_data.close()
shm_data.unlink()
shm_indices.close()
shm_indices.unlink()
shm_indptr.close()
shm_indptr.unlink()

Solution

  • It ended up that my matrix was not really sparse, so using SciPy's sparse matrix format didn't really help save memory. The main property of my matrix is that it's symmetric.

    ChatGPT 4o gives these basic steps for dealing with a symmetric matrix as though it is a full matrix, in order to save memory; it's basically what I'm asking for:

    Step 1: Save the Upper-Diagonal Matrix to a File

    First, you generate and save the upper-diagonal part of a symmetric matrix if it hasn't been saved yet.

    import numpy as np
    import os
    from multiprocessing import shared_memory
    
    filename = "upper_diag.npy"
    
    # Function to create a symmetric matrix and save the upper-diagonal part
    def create_and_save_upper_diag(filename, size=5):
        A = np.random.rand(size, size)
        A = (A + A.T) / 2  # Make A symmetric
        upper_diag_A = np.triu(A)
        np.save(filename, upper_diag_A)  # Save to a file
    
    # Check if the file exists, if not create and save the matrix
    if not os.path.exists(filename):
        create_and_save_upper_diag(filename)
    

    Step 2: Load from File into SHM

    Once you have the file, you can load it into shared memory.

    # Function to load a matrix from a .npy file into shared memory
    def load_to_shm(filename):
        # Load matrix from file
        upper_diag_A = np.load(filename)
    
        # Create shared memory block
        shm = shared_memory.SharedMemory(create=True, size=upper_diag_A.nbytes)
    
        # Create a NumPy array backed by shared memory
        upper_diag_A_shm = np.ndarray(upper_diag_A.shape, dtype=upper_diag_A.dtype, buffer=shm.buf)
        upper_diag_A_shm[:] = upper_diag_A[:]  # Copy data to shared memory
    
        return upper_diag_A_shm, shm
    
    # Load the matrix into shared memory
    upper_diag_A_shm, shm = load_to_shm(filename)
    

    Example of Accessing the Matrix

    You can access the matrix as discussed earlier using the symmetry of the matrix.

    def get_element(matrix, i, j):
        if i <= j:
            return matrix[i, j]
        else:
            return matrix[j, i]
    
    # Example of accessing elements
    print(get_element(upper_diag_A_shm, 1, 4))  # Access an element
    

    Cleaning Up

    When you are done using the shared memory, it’s important to clean it up properly.

    # Cleanup
    upper_diag_A_shm = None
    shm.close()
    shm.unlink()  # Remove the shared memory block
    

    This method combines file operations with shared memory handling in Python, making it suitable for applications where matrices are reused across different runs or processes, potentially saving time and memory by avoiding recomputation.