Search code examples
rmatrixcosine-similarity

How do i show best matching car name and cosine similarity value and show them on two separate columns? (need to extract the second largest value)


I am using mtcars data to calculate cosine similarity between every pair of automobiles. there are 11 variables and 32 observations. I have created a matrix to store the calculation results but don't know how to find the most similar automobile (only one is OK) for each one.

How do i show the best matching car name and the similarity value on two separate columns? (i actually need to extract the second largest value, as the max value 1.0 is matching the car to itself.) thanks

My matrix shows like this : 3.1 and i want to get the result looking like this 3.2


Solution

  • This should do the trick:

    data <- matrix(rbinom(40, 20, 0.5), 8, 8)
    rownames(data) <- LETTERS[1:8]
    m <- apply(data, 2, max)
    wm <- apply(data, 2, which.max)
    data[cbind(wm, 1:nrow(data))]
    out <- data.frame(Cars = rownames(data), 
      Most_Simiar = rownames(data[wm,],), 
      Cosine_Similarity = m)
    

    You will need to recode 1 to 0 so do something like this:

    data[which(data == 1, arr.ind = TRUE)] <- 0