Summarize two tables into new one

I have these two different tables:

#table1 all patients that have RNASeq data
experimental.strategy | submitter.id
      RNA-Seq         | TCGA-AA-3867
      RNA-Seq         | TCGA-F4-6809
      RNA-Seq         | TCGA-AA-3562
                    ...

#table2 all patients that have miRNAseq data

experimental.strategy   | submitter.id
      miRNA-Seq         | TCGA-A6-6650
      miRNA-Seq         | TCGA-AZ-4308
      miRNA-Seq         | TCGA-AA-A02Y
                     ...

There are patients with just one data available, so I have three kinds of patients: with only RNA-Seq data, with only miRNA-Seq data, and patients with both miRNA-Seq and RNA-Seq data available.

I want to create a new table with all patients IDs summarizing the data of those tables like this:

submitter.id |  miRNA-Seq  |  RNA-Seq  |  Paired
TCGA-4T-AA8H |      0      |     1     |    0
TCGA-5M-AAT5 |      1      |     1     |    1
TCGA-3L-AA1B |      1      |     0     |    0
TCGA-AA-A02Y |      0      |     1     |    0

How can I do this?

Solution

What do you need the third table for when you can get all the required information from Table 1 and Table 2? Use full_join to merge Table 1 and Table 2 and get the required result shown below

## Input Data

df1 <- read.table(text = "experimental.strategy  submitter.id
      RNA-Seq          TCGA-AA-3867
      RNA-Seq          TCGA-F4-6809
      RNA-Seq          TCGA-AA-3562", header = TRUE)

df2 <- read.table(text = "experimental.strategy    submitter.id
      miRNA-Seq          TCGA-AA-3867
      miRNA-Seq          TCGA-F4-6809
      miRNA-Seq          TCGA-AA-A02Y", header = TRUE)
   

df1 <- df1 %>% rename(RNA_Seq = experimental.strategy) %>%
  mutate(RNA_Seq = str_replace(RNA_Seq, "RNA-Seq","1")) %>%
  mutate(RNA_Seq = as.numeric(RNA_Seq))


df2 <- df2 %>% rename(miRNA_Seq = experimental.strategy) %>%
  mutate(miRNA_Seq = str_replace(miRNA_Seq, "miRNA-Seq","1")) %>%
  mutate(miRNA_Seq = as.numeric(miRNA_Seq))


df1 %>% full_join(df2, by = ("submitter.id")) %>%
  mutate_if(is.numeric,coalesce,0) %>% group_by(submitter.id) %>%
  mutate(Paired = if_else((RNA_Seq == 1 & miRNA_Seq == 1), 1, 0)) 

## Ouput

  RNA_Seq submitter.id miRNA_Seq Paired
    <dbl> <chr>            <dbl>  <dbl>
1       1 TCGA-AA-3867         1      1
2       1 TCGA-F4-6809         1      1
3       1 TCGA-AA-3562         0      0
4       0 TCGA-AA-A02Y         1      0