Search code examples
rdateselectfilterpurrr

Using a custom filter(across) function with map_if on list of dataframes with R


I'm trying to do some filtering on several dataframes from a list (with a character column that is date-like).

I know that all my df have n (here 10) entities, and then i try to spot long_df format where each row is a couple entity-date.

Then I'd like to apply on these long df a filtering function, finding the character column that is a date and keeping only one year.

I've read that filter_if is deprecated, so I've tried to do something with across syntax but it failed so far.

Any idea why it is not working ?


## a list of df 

list_df <- list(A = data.frame(ID = letters[1:10],
                               Var1 = rnorm(10),
                               Var2 = rnorm(10),
                               Var3 = rnorm(10)),
                B = data.frame(ID = rep(letters[1:10],3),
                               X = c(rep("01/01/2018", 10), 
                                     rep("01/01/2019", 10),
                                     rep("01/01/2020", 10)),
                               Var1 = rnorm(30),
                               Var2 = rnorm(30)),
                C = data.frame(ID = rep(letters[1:10],2),
                               D = c(rep("01/01/2018", 10), 
                                     rep("01/01/2019", 10)),
                               Var1 = rnorm(20),
                               Var2 = rnorm(20)))

## a custom function to find character column that are date (= B$X & C$D)

guessdate <- function(x) !all(is.na(as.Date(as.character(x),format="%d/%m/%Y"))) 

#test the function on one df
list_df[["B"]] %>% map(., guessdate)

## what i've tried so far
list_df %>% map_if(.p = ~ nrow(.x) > 10,  # apply function only on dataframe with more than 10 rows
                          ~ filter(across(where(map(., guessdate)), ~ str_detect(.x, "2018")))) ## filter the date-like column keeping only (2018) 


## desired ouput

output <- list(A = data.frame(ID = letters[1:10],
                               Var1 = rnorm(10),
                               Var2 = rnorm(10),
                               Var3 = rnorm(10)),
                B = data.frame(ID = letters[1:10],
                               X = c(rep("01/01/2018", 10)),
                               Var1 = rnorm(10),
                               Var2 = rnorm(10)),
                C = data.frame(ID = letters[1:10],
                               D = c(rep("01/01/2018", 10)),
                               Var1 = rnorm(10),
                               Var2 = rnorm(10)))


Solution

  • If looks like you have an extra map in there and are not passing the data.frame to filter. Try

    list_df %>% map_if(.p = ~ nrow(.x) > 10,
            ~ filter(.x, if_all(where(guessdate), ~ str_detect(.x, "2018"))))
    

    Using across() in filter() was deprecated in dplyr 1.0.8 so we use if_all() here