Search code examples
rdataframemergeapplycbind

How can we remove rows between multiple dataframe that shared a commun words in R?


I have 36 dataframes (divided in two groups H and E; 18 for each). All my dataframes carry a column with multiple words called 'AA'. I would like to remove all the words in H that exists in E and keep only the words unshared between H or E.

Do you have any idea of how I could do it ?

Thank you for your help !


Solution

  • Ok, I tried to simulate your environment. I have a list my_dfs with 4 data frames, H1, H2, E1, E2, belonging to two groups, H and E, as you can imagine. See the R code below.

    H1 <- data.frame(Name = c('Marcel', 'Bob', 'John'),
                     AA = c('Soccer', 'Swimming', 'Baseball'))
    
    H2 <- data.frame(Age = c('20', '41', '22'),
                     AA = c('something', 'something4', 'something5'))
    
    E1 <- data.frame(Age = c('20', '41', '22'),
                     AA = c('something', 'something2', 'something3'))
    E2 <- data.frame(Age = c('20', '41', '22'),
                     AA = c('Basketball', 'Voleyball', 'Baseball'))
    
    my_dfs <- list(H1, H2, E1, E2)
    

    There are some words in the column AA that are common among the 4 data frames and you would like to remove the rows in each data frame that contain a word in AA that is also in the AA column of the others.

    purrr::map(seq_along(my_dfs),
               ~ dplyr::anti_join(my_dfs[[.x]],
                                  dplyr::bind_rows(my_dfs[-.x]),
                                  by = 'AA'))
    

    The code above should solve your problem.