Search code examples
pythonrpython-3.xmatrixsparse-matrix

Creating matrix of 0 and 1 from a string vector in R or python


I want to create a matrix of 0 and 1 from a vector where each string contains the two names I want to map to the matrix. For example, if I have the following vector

vector_matrix <- c("A_B", "A_C", "B_C", "B_D", "C_D")

I would like to transform it into the following matrix

  A B C D
A 0 1 1 0
B 0 0 1 1
C 0 0 0 1
D 0 0 0 0

I am open to any suggestion, but it is better if there is some built-in function that can deal with it. I am trying to do a very similar thing but in a magnitude that I will generate a matrix of 25 million cells.

I prefer if the code is R, but doesn't matter if there is some pythonic solution :)

Edit: So when I say "A_B", I want a "1" in row A column B. It doesn't matter if it is the contrary (column A row B).

Edit: I would like to have a matrix where its rownames and colnames are the letters.


Solution

  • Create a two column data frame d from the data, calculate the levels and then generate a list in which each colunn of d is a factor and finally run table. The second line sorts each row and that isn't actually needed for the input shown so it could be omitted but you might need it for other data if B_A is to be regarded as A_B.

    d <- read.table(text = vector_matrix, sep = "_")
    d[] <- t(apply(d, 1, sort))
    tab <- table( lapply(d, factor, levels = levels(factor(unlist(d)))) )
    tab
    

    giving this table:

       V2
    V1  A B C D
      A 0 1 1 0
      B 0 0 1 1
      C 0 0 0 1
      D 0 0 0 0
    
    
    heatmap(tab[nrow(tab):1, ], NA, NA, col = 2:3, symm = TRUE)
    

    screenshot

    library(igraph)
    g <- graph_from_adjacency_matrix(tab, mode = "undirected")
    plot(g)
    

    screenshot