Search code examples
rmatrixsimilarity

Converting a Drug-Gene Interaction List to a Similarity Matrix in R


I have an interaction list, where the first column contains different drugs, and the second column contains genes which the drug interacts with.

For example the code below:

DGIdbpractice <- data.frame(c("drug1", "drug1", "drug1", "drug2", "drug2", "drug3","drug3","drug3"), c("gene1", "gene2", "gene3", "gene2", "gene3", "gene1", "gene3", "gene4"))
names(DGIdbpractice) <- c("drug", "gene")

Produces a dataframe which looks like:

  drug  gene
1 drug1 gene1
2 drug1 gene2
3 drug1 gene3
4 drug2 gene2
5 drug2 gene3
6 drug3 gene1
7 drug3 gene3
8 drug3 gene4

I want to create a similarity matrix comparing each drug to itself and each other, where the values within the matrix represent how many genes they both interact with.

It should look like the below matrix:

      gene1 gene2 gene3 
drug1  3     2     2
drug2  2     2     1
drug3  2     1     3 

I do not want to use multiple loops as the actual dataset contains over 4,000 drugs.

Thank you.


Solution

  • You can use dplyr with the table() function.

    b <- DGIdbpractice %>% full_join(DGIdbpractice, by = "gene")
    table(b$drug.x, b$drug.y)
    
            drug1 drug2 drug3
      drug1     3     2     2
      drug2     2     2     1
      drug3     2     1     3