Search code examples
rfunctionfilteringuser-input

Filter Data completely user defined r - multiple columns and filters


I am attempting to create a function that will allow a user to define an infinite number of columns and apply matching filters to those columns.

df <- data.frame(a=1:10, b=round(runif(10)), c=round(runif(10)))
|a| b|c|
|1| 1|1|
|2| 0|0|
|3| 0|1|
|4| 1|0|
|5| 1|0|
|6| 1|0|
|7| 1|1|
|8| 1|1|
|9| 1|0|
|10|1|1|

I would like the user to be able to filter the data based off either column, and apply different filters to each column. I know the following does not work. But this would be the general idea.

test <- function(df, fCol, fParam){
    df %>% filter(fCol[1] %in% fParam[1] | fCol[2] %in% fParam[2])
}
test(df, c("b","c"),c(1,0)
# Which I would want it to return
|a|b|c|
|4|1|0|
|5|1|0|
|6|1|0|
|9|1|0|

The issue that I run into is that I won't know how many columns the user will want to filter, nor will I know the column names.

Any help at all would be greatly appreciated. Please ask questions if you have them. I tried my best to give a reprex.


Solution

  • I believe this should satisfy what you want

    library(tidyr)
    library(dplyr)
    test <- function(df,
                     fCol,
                     fParam,
                     match_type = "any")
       {
      if(!is.element(match_type, c("any","all"))|length(match_type)!=1){
        stop()
      }
      df <- df %>% ungroup() %>%
        mutate(..id..=1:n())
      meta <- data.frame(fCol=fCol,fParam=fParam)
      logi <- df %>%
        select("..id..",fCol) %>%
        gather(key = "key", value = "value", -..id..) %>%
        left_join(., y = meta, by = c("key"="fCol")) %>%
        mutate(match = value==fParam) %>%
        select(-key,-value, -fParam) %>%
        group_by_at(setdiff(names(.),"match")) %>%
        summarise(match = ifelse(match_type%in%"any",any(match), all(match)))
      df2 <- left_join(df, logi, by = intersect(colnames(df),colnames(logi))) %>%
        filter(match)%>%
        select(-match, -..id..)
      return(df2)
    }
    
    df <- data.frame(a=1:10, b=round(runif(10)), c=round(runif(10)))
    df
    #    a b c
    #1   1 0 1
    #2   2 1 0
    #3   3 0 0
    #4   4 0 1
    #5   5 0 1
    #6   6 0 1
    #7   7 1 0
    #8   8 1 1
    #9   9 1 0
    #10 10 1 0
    
    #use "any" to do an | match
    test(df, c("b","c"),c(1,0), match_type = "any")
    #   a b c
    #1  2 1 0
    #2  3 0 0
    #3  7 1 0
    #4  8 1 1
    #5  9 1 0
    #6 10 1 0
    
    #use "all" to do an & match
    test(df, c("b","c"),c(1,0), match_type = "all")
    #   a b c
    #1  2 1 0
    #2  7 1 0
    #3  9 1 0
    #4 10 1 0
    

    You can also specify the same colname for fCol multiple times if you want to match multiple values

    test(df, c("b","b"),c(1,0)) #matches everything but you get the point