Search code examples
sparse-matrixsvd

shapes not aligned error when performing Singular Value Decomposition using scipy.sparse.linalg


I am trying to use Singular Value Decomposition (SVD) to predict missing values in a sparse matrix. Chapter 4 of the "Building Recommendation Engines in Python" Datacamp course provides an example of doing this with movie ratings, which is great. As a first step, I have been trying to replicate this Datacamp example on my local PC using Jupyter Notebook. However, when I try to multiply the U_Sigma and Vt matricies which are output from the "svds" function, I get an error:

    ValueError: shapes (671,) and (6,9161) not aligned: 671 (dim 0) != 6 (dim 0)

I am using this dataset: https://www.kaggle.com/rounakbanik/the-movies-dataset/version/7?select=ratings_small.csv

Here is the code I am trying to run:

    import pandas as pd
    
    filename = 'ratings_small.csv'
    df = pd.read_csv(filename)
    
    df.head()
    user_ratings_df = df.pivot(index='userId', columns='movieId', values='rating')
    
    # Get the average rating for each user 
    avg_ratings = user_ratings_df.mean(axis=1)
    
    # Center each user's ratings around 0
    user_ratings_centered = user_ratings_df.sub(avg_ratings, axis=1)
    
    # Fill in all missing values with 0s
    user_ratings_centered.fillna(0, inplace=True)
    # Print the mean of each column
    print(user_ratings_centered.mean(axis=1))
    
    ######################
    # Import the required libraries 
    from scipy.sparse.linalg import svds
    import numpy as np
    
    # Decompose the matrix
    U, sigma, Vt = svds(user_ratings_centered)
    
    ## Now that you have your three factor matrices, you can multiply them back together to get complete ratings data 
    # without missing values. In this exercise, you will use numpy's dot product function to multiply U and sigma first, 
    # then the result by Vt. You will then be able add the average ratings for each row to find your final ratings.
    
    # Dot product of U and sigma
    U_sigma = np.dot(U, sigma)
    
    # Dot product of result and Vt
    U_sigma_Vt = np.dot(U_sigma, Vt)


Solution

  • There was a missing line of code. After running "svds" to decompose the matrix, we need this line:

    # Convert sigma into a diagonal matrix
    sigma = np.diag(sigma)