Search code examples
rmerge

Summarize two tables into new one


I have these two different tables:

#table1 all patients that have RNASeq data
experimental.strategy | submitter.id
      RNA-Seq         | TCGA-AA-3867
      RNA-Seq         | TCGA-F4-6809
      RNA-Seq         | TCGA-AA-3562
                    ...

#table2 all patients that have miRNAseq data

experimental.strategy   | submitter.id
      miRNA-Seq         | TCGA-A6-6650
      miRNA-Seq         | TCGA-AZ-4308
      miRNA-Seq         | TCGA-AA-A02Y
                     ...

There are patients with just one data available, so I have three kinds of patients: with only RNA-Seq data, with only miRNA-Seq data, and patients with both miRNA-Seq and RNA-Seq data available.

I want to create a new table with all patients IDs summarizing the data of those tables like this:

submitter.id |  miRNA-Seq  |  RNA-Seq  |  Paired
TCGA-4T-AA8H |      0      |     1     |    0
TCGA-5M-AAT5 |      1      |     1     |    1
TCGA-3L-AA1B |      1      |     0     |    0
TCGA-AA-A02Y |      0      |     1     |    0

How can I do this?


Solution

  • What do you need the third table for when you can get all the required information from Table 1 and Table 2? Use full_join to merge Table 1 and Table 2 and get the required result shown below

    ## Input Data
    
    df1 <- read.table(text = "experimental.strategy  submitter.id
          RNA-Seq          TCGA-AA-3867
          RNA-Seq          TCGA-F4-6809
          RNA-Seq          TCGA-AA-3562", header = TRUE)
    
    df2 <- read.table(text = "experimental.strategy    submitter.id
          miRNA-Seq          TCGA-AA-3867
          miRNA-Seq          TCGA-F4-6809
          miRNA-Seq          TCGA-AA-A02Y", header = TRUE)
       
    
    df1 <- df1 %>% rename(RNA_Seq = experimental.strategy) %>%
      mutate(RNA_Seq = str_replace(RNA_Seq, "RNA-Seq","1")) %>%
      mutate(RNA_Seq = as.numeric(RNA_Seq))
    
    
    df2 <- df2 %>% rename(miRNA_Seq = experimental.strategy) %>%
      mutate(miRNA_Seq = str_replace(miRNA_Seq, "miRNA-Seq","1")) %>%
      mutate(miRNA_Seq = as.numeric(miRNA_Seq))
    
    
    df1 %>% full_join(df2, by = ("submitter.id")) %>%
      mutate_if(is.numeric,coalesce,0) %>% group_by(submitter.id) %>%
      mutate(Paired = if_else((RNA_Seq == 1 & miRNA_Seq == 1), 1, 0)) 
    
    ## Ouput
    
      RNA_Seq submitter.id miRNA_Seq Paired
        <dbl> <chr>            <dbl>  <dbl>
    1       1 TCGA-AA-3867         1      1
    2       1 TCGA-F4-6809         1      1
    3       1 TCGA-AA-3562         0      0
    4       0 TCGA-AA-A02Y         1      0