Search code examples
rdataframematrixassign

How to create a matrix based on information in other dataframe in R?


I have a dataframe with gene names and miRNA interaction information. The dataframe looks like below:

df:

Gene      miRNA
ACP1    hsa-let-7a-5p
AGO4    hsa-let-7a-5p
AMMECR1 hsa-let-7a-5p
ATM     hsa-miR-100-5p
BMPR2   hsa-miR-100-5p
AGO1    hsa-miR-107
AGO2    hsa-miR-107
AGO3    hsa-miR-107

Using the above information which is gene-miRNA interaction information, I wanted to create a matrix. If there is interaction I would like to assign 1 if not 0. The matrix should look like below:

          hsa-let-7a-5p hsa-miR-100-5p  hsa-miR-107
ACP1           1              0              0
AGO4           1              0              0
AMMECR1        1              0              0
ATM            0              1              0
BMPR2          0              1              0
AGO1           0              0              1
AGO2           0              0              1 
AGO3           0              0              1

I tried using xtabs for this. Couldn't use it correctly.

xtabs(c(1L, 0L)[miRNA] ~ ., data=df)

Result looks like below:

Gene
   ACP1    AGO1    AGO2    AGO3    AGO4 AMMECR1     ATM   BMPR2 
      1       0       0       0       1       1       0       0 

Any help is appreciated. thanq.


Solution

  • We can create a dummy column with mutate and use pivot_wider to cast data into wide format.

    library(dplyr)
    library(tidyr) # version ‘1.0.0’
    
    df %>%
      mutate(n = 1) %>%
      pivot_wider(names_from = miRNA, values_from = n, values_fill = list(n = 0))
      #OR
      #spread(miRNA, n, fill = 0) in old tidyr
    
    
    #  Gene    `hsa-let-7a-5p` `hsa-miR-100-5p` `hsa-miR-107`
    #  <fct>             <dbl>            <dbl>         <dbl>
    #1 ACP1                  1                0             0
    #2 AGO4                  1                0             0
    #3 AMMECR1               1                0             0
    #4 ATM                   0                1             0
    #5 BMPR2                 0                1             0
    #6 AGO1                  0                0             1
    #7 AGO2                  0                0             1
    #8 AGO3                  0                0             1
    

    If there is more than one row for each Gene and miRNA use distinct first.

    df %>%
      distinct() %>%
      mutate(n = 1) %>%
      pivot_wider(names_from = miRNA, values_from = n, values_fill = list(n = 0))