I am trying to use Singular Value Decomposition (SVD) to predict missing values in a sparse matrix. Chapter 4 of the "Building Recommendation Engines in Python" Datacamp course provides an example of doing this with movie ratings, which is great. As a first step, I have been trying to replicate this Datacamp example on my local PC using Jupyter Notebook. However, when I try to multiply the U_Sigma and Vt matricies which are output from the "svds" function, I get an error:
ValueError: shapes (671,) and (6,9161) not aligned: 671 (dim 0) != 6 (dim 0)
I am using this dataset: https://www.kaggle.com/rounakbanik/the-movies-dataset/version/7?select=ratings_small.csv
Here is the code I am trying to run:
import pandas as pd filename = 'ratings_small.csv' df = pd.read_csv(filename) df.head() user_ratings_df = df.pivot(index='userId', columns='movieId', values='rating') # Get the average rating for each user avg_ratings = user_ratings_df.mean(axis=1) # Center each user's ratings around 0 user_ratings_centered = user_ratings_df.sub(avg_ratings, axis=1) # Fill in all missing values with 0s user_ratings_centered.fillna(0, inplace=True) # Print the mean of each column print(user_ratings_centered.mean(axis=1)) ###################### # Import the required libraries from scipy.sparse.linalg import svds import numpy as np # Decompose the matrix U, sigma, Vt = svds(user_ratings_centered) ## Now that you have your three factor matrices, you can multiply them back together to get complete ratings data # without missing values. In this exercise, you will use numpy's dot product function to multiply U and sigma first, # then the result by Vt. You will then be able add the average ratings for each row to find your final ratings. # Dot product of U and sigma U_sigma = np.dot(U, sigma) # Dot product of result and Vt U_sigma_Vt = np.dot(U_sigma, Vt)
There was a missing line of code. After running "svds" to decompose the matrix, we need this line:
# Convert sigma into a diagonal matrix
sigma = np.diag(sigma)