Search code examples
rlistfunctionintersect

How do I find common characters in a list of dataframes?


I have about 70 dataframes in a list, each of them has a column named SNP. I want to find the common SNPs that exist in all dataframes. This is the code I used:

setwd("~")
library(data.table)

files <- list.files()
dflist <- list()
for(i in 1:length(files)){
 dflist[[i]] <- fread(files[i])
}

map(dflist, ~.$SNP) %>% 
reduce(intersect) 

However, this returns the following message:

character(0)
list(structure(list(`10:103391446` = c("10:115562764:TTTC_",
"10:115562765:TTC_T", "10:14188623_CCTGA_C", "10:15988900:G_GGT"
)), row.names = c(NA, -4L), class = c("data.table", "data.frame"
)), structure(list(SNP = c("rs34394051",
"rs11121177", "rs10799615", "rs590013")), row.names = c(NA, -4L
), class = c("data.table", "data.frame")),
    structure(list(SNP = c("rs34394051", "rs11121177", "rs10799615",
    "rs590013")), row.names = c(NA, -4L), class = c("data.table",
    "data.frame")))

Can you help please?


Solution

  • Your problems appear to be two-fold:

    1. One of your frames is missing SNP as a column name. That will often cause problems:

      setdiff(mtcars$QUUX, mtcars$cyl)
      # NULL
      

      This is not hard to fix (names(dflist[[1]]) <- "SNP"), but does not resolve all of the problems.

    2. Your first frame has completely different-looking data. When I skip the first frame, it works.

      map(dflist[-1], ~.$SNP) %>%
        reduce(intersect)
      # [1] "rs34394051" "rs11121177" "rs10799615" "rs590013"