I am using mtcars data to calculate cosine similarity between every pair of automobiles. there are 11 variables and 32 observations. I have created a matrix to store the calculation results but don't know how to find the most similar automobile (only one is OK) for each one.
How do i show the best matching car name and the similarity value on two separate columns? (i actually need to extract the second largest value, as the max value 1.0 is matching the car to itself.) thanks
My matrix shows like this : 3.1 and i want to get the result looking like this 3.2
This should do the trick:
data <- matrix(rbinom(40, 20, 0.5), 8, 8)
rownames(data) <- LETTERS[1:8]
m <- apply(data, 2, max)
wm <- apply(data, 2, which.max)
data[cbind(wm, 1:nrow(data))]
out <- data.frame(Cars = rownames(data),
Most_Simiar = rownames(data[wm,],),
Cosine_Similarity = m)
You will need to recode 1
to 0
so do something like this:
data[which(data == 1, arr.ind = TRUE)] <- 0