Search code examples
rmatrixvectorsimilaritysna

Transform categorical attribute vector into similarity matrix


I need to transfrom a categorical attribute vector into a "same attribute matrix" using R.

For example I have a vector which reports gender of N people (male = 1, female = 0). I need to convert this vector into a NxN matrix named A (with people names on rows and columns), where each cell Aij has the value of 1 if two persons (i and j) have the same gender and 0 otherwise.

Here is an example with 3 persons, first male, second female, third male, which produce this vector:

c(1, 0, 1) 

I want to transform it into this matrix:

A = matrix( c(1, 0, 1, 0, 1, 0, 1, 0, 1), nrow=3, ncol=3, byrow = TRUE) 

Solution

  • Like lmo said in acomment it's impossible to know the structure of your dataset so what follows is just an example for you to see how it could be done.
    First, make up some data.

    set.seed(3488)    # make the results reproducible
    x <- LETTERS[1:5]
    y <- sample(0:1, 5, TRUE)
    df <- data.frame(x, y)
    

    Now tabulate it according to your needs

    A <- outer(df$y, df$y, function(a, b) as.integer(a == b))
    dimnames(A) <- list(df$x, df$x)
    A
    #  A B C D E
    #A 1 1 1 0 0
    #B 1 1 1 0 0
    #C 1 1 1 0 0
    #D 0 0 0 1 1
    #E 0 0 0 1 1