Search code examples
rlistdataframedplyrmagrittr

Apply dplyr functions on a single column across a list using piping


I'm tring to filter something across a list of dataframes for a specific column. Typically across a single dataframe using dplyr I would use:

#creating dataframe
df <- data.frame(a = 0:10, d = 10:20)

# filtering column a for rows greater than 7
df %>% filter(a > 7)

I've tried doing this across a list using the following:


# creating list
x <- list(data.frame(a = 0:10, b = 10:20), 
data.frame(c = 11:20, d = 21:30), 
data.frame(e = 15:25, f = 35:45))

# selecting the appropriate column and trying to filter
# this is not working
x[1][[1]][1] %>% lapply(. %>% {filter(. > 2)})

# however, if I use the min() function it works
x[1][[1]][1] %>% lapply(. %>% {min(.)})

I find the %>% syntax quite easy to understand and carry out. However, in this case, selecting a specific column and doing something quite simple like filtering is not working. I'm guessing map could be equally useful. Any help is appreciated.


Solution

  • It seems you are interested in checking the condition on the first column of each dataframe in your list. One solution using dplyr would be

    lapply(x, function(df) {df %>% filter_at(1, ~. > 7)})
    

    The 1 in filter_at indicates that I want to check the condition on the first column (1 is a positional index) of each dataframe in the list.


    EDIT

    After the discussion in the comments, I propose the following solution

    lapply(x, function(df) {df %>% filter(a > 7) %>% select(a) %>% slice(1)})
    

    Input data

    x <- list(data.frame(a = 0:10, b = 10:20), 
          data.frame(a = 11:20, b = 21:30), 
          data.frame(a = 15:25, b = 35:45))
    

    Output

    [[1]]
      a
    1 8
    
    [[2]]
       a
    1 11
    
    [[3]]
       a
    1 15