Search code examples
rsparse-matrix

R search accross similarity sparse matrix


I've got a big sparse occurrence matrix that contains product similarity data. All the products appear in the same order on both the x and y and a value of 1 means that the products are the same whether a value of 0 means that the products are different.

As follows:

P1  P2  P3  P4
P1  1   1   0   0
P2  0   1   0   1
P3  0   0   1   1
P4  0   1   0   1

In this case P1 is similar to itself and to P2 but P2 is similar to P4. So finally P1, P2 and P4 are the same. I need to write something in R that will assign to P1, P2 and P4 the same code as follow:

Product_Name  Ref_Code 
     P1          P1
     P2          P1
     P3          P3
     P4          P1

Is it possible to do it in R?

Cheers,

Dario.


Solution

  • I agree with @Prem, as per your logic, all products are the same. I have provided a code example using reshape2package to put your products into long format. Even though your similarity measure does not create any difference between the products, you might use the ouput from melt(), to sort and filter the data in a different way regarding similarity and thereby achieve what you want.

    library(reshape2)
    
    data <- read.table ( text = "P1  P2  P3  P4
                              P1  1   1   0   0
                              P2  0   1   0   1
                              P3  0   0   1   1
                              P4  0   1   0   1"
                              , header = TRUE, stringsAsFactors = FALSE)
    
    
    data <-cbind(rownames(data), data)
    names(data)[1] <- "product1"
    
    data.melt <- melt(data
                 , id.vars = "product1"
                 , measure.vars = colnames(data)[2:ncol(data)]
                 , variable.name = "product2"
                 , value.name = "similarity"
                 ,factorsAsStrings = TRUE)
    
    #check the output of melt, maybe the long format is suitable for your task    
    data.melt
    
    #if you split the data by your similarity and check the unique products
    #in each list, you will see that they are all the same
    data.split <- split(data.melt, data.melt$similarity)
    
    lapply(data.split, function(x) {
    
      unique(unlist(x[, c("product1", "product2")]))
    
    
    })