Multidimensional scaling: Error: vector memory limit of 16.0 Gb reached

I have a data frame containing judgements of similarity of pairs of voices from 9 listeners. I'm trying to run multi-dimensional scaling so that I can see the multiple relationships between voices on a plot, with individual differences scaling. I use smacof package.

dataframe: https://gist.github.com/al3ka/2b4948d4c13baecae75880dd7f1d5e2c

Here is my code:

# Get unique voices
voices <- unique(c(results$voice1, results$voice2))

# Number of voices
n_voices <- length(voices)

# Create a list to hold each subject's dissimilarity matrix
dissimilarity_list <- list()

# Create a square dissimilarity matrix for each subject
for (subject_id in unique(results$subject)) {
  # Filter responses for the current subject
  subject_responses <- results %>% filter(subject == subject_id)
  
  # Initialize an empty matrix
  dissimilarity_matrix <- matrix(NA, nrow = n_voices, ncol = n_voices,
                                 dimnames = list(voices, voices))
  
  # Fill the matrix with responses
  for (i in 1:nrow(subject_responses)) {
    voice1 <- subject_responses$voice1[i]
    voice2 <- subject_responses$voice2[i]
    response <- subject_responses$response[i]
    dissimilarity_matrix[voice1, voice2] <- response
    dissimilarity_matrix[voice2, voice1] <- response  # Assuming symmetry
  }
  
  # Append to the list
  dissimilarity_list[[subject_id]] <- dissimilarity_matrix
}

# Convert list of matrices to 3D array
dissimilarity_array <- array(NA, dim = c(n_voices, n_voices, length(dissimilarity_list)))
for (i in 1:length(dissimilarity_list)) {
  dissimilarity_array[,,i] <- dissimilarity_list[[i]]
}

# Replace NA values with 0 in the dissimilarity matrices
dissimilarity_array[is.na(dissimilarity_array)] <- 0

# Perform Individual Differences Scaling (INDSCAL)
indscal_result <- smacofIndDiff(dissimilarity_array, ndim = 2)

At the point where I try to create the array, I get the error:

Error: vector memory limit of 16.0 Gb reached, see mem.maxVSize()

Despite my list being about 0.09gb. I have a brand new MBair and have dealt with data sets much larger than this before. How can I get around this error? Is there a way of running the MDS for multiple listeners, without creating an average score for each pair of voices, since that would obfuscate some of the variance in the data?

I used to run this in SPSS and had no problems but don't know how to handle multiple listeners in this implementation in R without creating an array, which is what I presume is making me run into memory issues. Please help!

Solution

Your issue was caused by indexing into dissimilarity_list with very large numbers (the subject ID's are large numbers) this created a huge list (where you only have 9 subjects ) and the subsequent matrix would become over 16Gb as a result. Therefore to solve your issues, convert your subjectID's to text when you use them to construct the list i.e.

  dissimilarity_list[[as.character(subject_id)]] <- dissimilarity_matrix