I have these two different tables:
#table1 all patients that have RNASeq data
experimental.strategy | submitter.id
RNA-Seq | TCGA-AA-3867
RNA-Seq | TCGA-F4-6809
RNA-Seq | TCGA-AA-3562
...
#table2 all patients that have miRNAseq data
experimental.strategy | submitter.id
miRNA-Seq | TCGA-A6-6650
miRNA-Seq | TCGA-AZ-4308
miRNA-Seq | TCGA-AA-A02Y
...
There are patients with just one data available, so I have three kinds of patients: with only RNA-Seq data, with only miRNA-Seq data, and patients with both miRNA-Seq and RNA-Seq data available.
I want to create a new table with all patients IDs summarizing the data of those tables like this:
submitter.id | miRNA-Seq | RNA-Seq | Paired
TCGA-4T-AA8H | 0 | 1 | 0
TCGA-5M-AAT5 | 1 | 1 | 1
TCGA-3L-AA1B | 1 | 0 | 0
TCGA-AA-A02Y | 0 | 1 | 0
How can I do this?
What do you need the third table for when you can get all the required information from Table 1 and Table 2? Use full_join
to merge
Table 1
and Table 2
and get the required result shown below
## Input Data
df1 <- read.table(text = "experimental.strategy submitter.id
RNA-Seq TCGA-AA-3867
RNA-Seq TCGA-F4-6809
RNA-Seq TCGA-AA-3562", header = TRUE)
df2 <- read.table(text = "experimental.strategy submitter.id
miRNA-Seq TCGA-AA-3867
miRNA-Seq TCGA-F4-6809
miRNA-Seq TCGA-AA-A02Y", header = TRUE)
df1 <- df1 %>% rename(RNA_Seq = experimental.strategy) %>%
mutate(RNA_Seq = str_replace(RNA_Seq, "RNA-Seq","1")) %>%
mutate(RNA_Seq = as.numeric(RNA_Seq))
df2 <- df2 %>% rename(miRNA_Seq = experimental.strategy) %>%
mutate(miRNA_Seq = str_replace(miRNA_Seq, "miRNA-Seq","1")) %>%
mutate(miRNA_Seq = as.numeric(miRNA_Seq))
df1 %>% full_join(df2, by = ("submitter.id")) %>%
mutate_if(is.numeric,coalesce,0) %>% group_by(submitter.id) %>%
mutate(Paired = if_else((RNA_Seq == 1 & miRNA_Seq == 1), 1, 0))
## Ouput
RNA_Seq submitter.id miRNA_Seq Paired
<dbl> <chr> <dbl> <dbl>
1 1 TCGA-AA-3867 1 1
2 1 TCGA-F4-6809 1 1
3 1 TCGA-AA-3562 0 0
4 0 TCGA-AA-A02Y 1 0