I have 36 dataframes (divided in two groups H and E; 18 for each). All my dataframes carry a column with multiple words called 'AA'. I would like to remove all the words in H that exists in E and keep only the words unshared between H or E.
Do you have any idea of how I could do it ?
Thank you for your help !
Ok, I tried to simulate your environment. I have a list my_dfs
with 4 data frames, H1
, H2
, E1
, E2
, belonging to two groups, H
and E
, as you can imagine. See the R code below.
H1 <- data.frame(Name = c('Marcel', 'Bob', 'John'),
AA = c('Soccer', 'Swimming', 'Baseball'))
H2 <- data.frame(Age = c('20', '41', '22'),
AA = c('something', 'something4', 'something5'))
E1 <- data.frame(Age = c('20', '41', '22'),
AA = c('something', 'something2', 'something3'))
E2 <- data.frame(Age = c('20', '41', '22'),
AA = c('Basketball', 'Voleyball', 'Baseball'))
my_dfs <- list(H1, H2, E1, E2)
There are some words in the column AA
that are common among the 4 data frames and you would like to remove the rows in each data frame that contain a word in AA
that is also in the AA
column of the others.
purrr::map(seq_along(my_dfs),
~ dplyr::anti_join(my_dfs[[.x]],
dplyr::bind_rows(my_dfs[-.x]),
by = 'AA'))
The code above should solve your problem.