I've got a big sparse occurrence matrix that contains product similarity data. All the products appear in the same order on both the x and y and a value of 1 means that the products are the same whether a value of 0 means that the products are different.
As follows:
P1 P2 P3 P4
P1 1 1 0 0
P2 0 1 0 1
P3 0 0 1 1
P4 0 1 0 1
In this case P1 is similar to itself and to P2 but P2 is similar to P4. So finally P1, P2 and P4 are the same. I need to write something in R that will assign to P1, P2 and P4 the same code as follow:
Product_Name Ref_Code
P1 P1
P2 P1
P3 P3
P4 P1
Is it possible to do it in R?
Cheers,
Dario.
I agree with @Prem, as per your logic, all products are the same. I have provided a code example using reshape2
package to put your products into long format. Even though your similarity measure does not create any difference between the products, you might use the ouput from melt()
, to sort and filter the data in a different way regarding similarity and thereby achieve what you want.
library(reshape2)
data <- read.table ( text = "P1 P2 P3 P4
P1 1 1 0 0
P2 0 1 0 1
P3 0 0 1 1
P4 0 1 0 1"
, header = TRUE, stringsAsFactors = FALSE)
data <-cbind(rownames(data), data)
names(data)[1] <- "product1"
data.melt <- melt(data
, id.vars = "product1"
, measure.vars = colnames(data)[2:ncol(data)]
, variable.name = "product2"
, value.name = "similarity"
,factorsAsStrings = TRUE)
#check the output of melt, maybe the long format is suitable for your task
data.melt
#if you split the data by your similarity and check the unique products
#in each list, you will see that they are all the same
data.split <- split(data.melt, data.melt$similarity)
lapply(data.split, function(x) {
unique(unlist(x[, c("product1", "product2")]))
})