Search code examples
rdplyr

Matching elements across datasets and replacing cell from one dataset with that of another


I have the following dumbed down datasets, and want to replace the 1:70728:C:T with rs123456.

a<- data.frame(ID=c(3,44),SNP=c("1:70728:C:T","1:612758:T:C "),effect_allele.outcome=c("C","T"), other_allele.outcome=c("T","C"), eaf.outcome=c(0.997843,0.993087), beta.outcome=c(0.03372300,-0.00728038), se.outcome=c(0.0250305, 0.0139970), pval.outcome=c(0.19,0.60), outcome=c("outcome","outcome"))


b <- data.frame(SNP=c("1:70728:C:T"), rsID=c("rs123456"))

I tried doing the following

library(dplyr)
insomnia_chr_pos %>%
  mutate(SNP = case_when((SNP==matched_mild$SNP) ~ matched_mild$rsid,
                         (SNP==matched_hosp$SNP) ~ matched_hosp$rsid,
                         (SNP==matched_sevr$SNP) ~ matched_sevr$rsid))

But I get this error:

Error in `mutate()`: ℹ In argument: `SNP = case_when(...)`. Caused by error in `case_when()`: ! Can't recycle `..1 (left)` (size 27697) to match `..1 (right)` (size 85).

the dataset with size 27697 corresponds with a, and the one size 85 corresponds with b.

Any help is greatly appreciated! Many thanks!


Solution

  • It seems that what you want to do is join the tables.

    In your example this would mean doing a left join like so:

    joined <- a %>% 
      left_join(b, by = "SNP")
    

    From your example code I understand that you want to join three tables and keep every match with SNP.

    If the three tables have the same columns you can do a simply bind the rows together with:

    bind_rows(b, c, d) 
    

    Then you can do the join as above.

    If they don't have the same columns, you can join all of them to insomnia_chr_pos and then create a new column with mutate where you keep every record from the three joined columns that is not NA.