I have up to 30 data frames with a common ID column. There are other colums in each df but im just showing the ID here.
Library DF1 DF2 DF3
ID# ID# ID# ....
1111 1111 1112 ....
2222 1111 3333 ....
3333 3333 3333 ....
4444 2222 4444 ....
I have to compare the ID# colum in each of theses tables to the library ID colum to make sure the Id number matches an ID number in the library.
Currently I use dplyr and do...
DF1 %>%
anti_join(library, by = 'ID#')
and the same for each table. It would just return any ID numbers that are not in the library, I do this same command for every data table but would like to run it for all 30 of my tables. I put all my DFs in a list but I'm not really sure how to proceed, for loop? apply? any help would be appreciated, as this pushes my boundaries of R knowledge.
You can iterate over a list of data.frame using purrr
. Here is an example using 3 data.frame to extract ID that are not common with reference one.
You can use any map_*
fonction taht suits you best any function you want inside a map_*
function
see purrr website for more info
library(dplyr)
#>
#> Attachement du package : 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
set.seed(999)
df_library <- data_frame(ID = sort(sample(1:12, 10)))
df1 <- data_frame(ID = sort(sample(1:12, 10)))
df2 <- data_frame(ID = sort(sample(1:12, 10)))
df3 <- data_frame(ID = sort(sample(1:12, 10)))
library(purrr)
#>
#> Attachement du package : 'purrr'
#> The following objects are masked from 'package:dplyr':
#>
#> contains, order_by
list(df1 = df1, df2 = df2, df3 = df3) %>%
map_df(~ anti_join(.x, df_library, by = "ID"), .id = "df_name")
#> # A tibble: 4 x 2
#> df_name ID
#> <chr> <int>
#> 1 df1 12
#> 2 df2 12
#> 3 df3 3
#> 4 df3 12