I have a data frame containing judgements of similarity of pairs of voices from 9 listeners. I'm trying to run multi-dimensional scaling so that I can see the multiple relationships between voices on a plot, with individual differences scaling. I use smacof
dataframe: https://gist.github.com/al3ka/2b4948d4c13baecae75880dd7f1d5e2c
Here is my code:
# Get unique voices
voices <- unique(c(results$voice1, results$voice2))
# Number of voices
n_voices <- length(voices)
# Create a list to hold each subject's dissimilarity matrix
dissimilarity_list <- list()
# Create a square dissimilarity matrix for each subject
for (subject_id in unique(results$subject)) {
# Filter responses for the current subject
subject_responses <- results %>% filter(subject == subject_id)
# Initialize an empty matrix
dissimilarity_matrix <- matrix(NA, nrow = n_voices, ncol = n_voices,
dimnames = list(voices, voices))
# Fill the matrix with responses
for (i in 1:nrow(subject_responses)) {
voice1 <- subject_responses$voice1[i]
voice2 <- subject_responses$voice2[i]
response <- subject_responses$response[i]
dissimilarity_matrix[voice1, voice2] <- response
dissimilarity_matrix[voice2, voice1] <- response # Assuming symmetry
# Append to the list
dissimilarity_list[[subject_id]] <- dissimilarity_matrix
# Convert list of matrices to 3D array
dissimilarity_array <- array(NA, dim = c(n_voices, n_voices, length(dissimilarity_list)))
for (i in 1:length(dissimilarity_list)) {
dissimilarity_array[,,i] <- dissimilarity_list[[i]]
# Replace NA values with 0 in the dissimilarity matrices
dissimilarity_array[is.na(dissimilarity_array)] <- 0
# Perform Individual Differences Scaling (INDSCAL)
indscal_result <- smacofIndDiff(dissimilarity_array, ndim = 2)
At the point where I try to create the array, I get the error:
Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()
Despite my list being about 0.09gb. I have a brand new MBair and have dealt with data sets much larger than this before. How can I get around this error? Is there a way of running the MDS for multiple listeners, without creating an average score for each pair of voices, since that would obfuscate some of the variance in the data?
I used to run this in SPSS and had no problems but don't know how to handle multiple listeners in this implementation in R without creating an array, which is what I presume is making me run into memory issues. Please help!
Your issue was caused by indexing into dissimilarity_list with very large numbers (the subject ID's are large numbers) this created a huge list (where you only have 9 subjects ) and the subsequent matrix would become over 16Gb as a result. Therefore to solve your issues, convert your subjectID's to text when you use them to construct the list i.e.
dissimilarity_list[[as.character(subject_id)]] <- dissimilarity_matrix