Search code examples
rdataframenested-lists

Extract repeated observations of single column in a list of dataframes R


I have the following reproducible example:

# I have 4 dataframes in a listy with a few observations that repeat themselves
df_1 <- data.frame(x = c(1, 2, 3, 4), y = c('apple', 'pineapple', 'orange', 'grape'))
df_2 <- data.frame(x = c(2, 3, 4, 5, 6, 7), y = c('watermelon', 'orange', 'halibut', 'apple', 'iron', 'grape'))
df_3 <- data.frame(x = c(2, 3, 4, 5, 6, 7, 9, 0), y = c('rock', 'pineapple', 'apple', 'tire', 'bomb', 'star', 'coconut', 'grape'))
df_4 <- data.frame(x = c(1, 4, 9), y = c('grape', 'apple', 'rock'))

# All inside a another list
df_list <- list(df_1, df_2, df_3, df_4)

I want a function that returns the values that are repeated in the y column in each dataframe, disregarding the order of the observation in the column, so the result of the function should only be:

[1] "apple" "grape"

I have tried reduce(intersect, big_list) but the function is not capturing all the repeated values across the dataframes. It may be because the position of the repeated value in each column is not the same or because column sizes are not equal between dataframes in list.


Solution

  • To find values that appear in all data frames, you can use Reduce() to apply intersect() to each y column (which we can extract with lapply()).

    Reduce(intersect, lapply(df_list, \(df) df$y))
    # [1] "apple" "grape"
    

    Alternatively, you can do:

    Reduce(
        \(df1, df2) intersect(df1, df2$y),
        df_list[-1],
        init = df_list[[1]]$y
    )
    # [1] "apple" "grape"