Search code examples
rrangematchmultiple-columnsr-colnames

Return list of columns containing data outside a predetermined range in r


In order to filter a data.frame for only the the columns of interest I need to find the columns in this data.frame containing data outside a specific range. Let the data.frame be

df<-data.frame(x1=c(1,5,9),x2=c(10,20,30),x3=c(20,100,1000))
ranges<-data.frame(y1=c(3,8),y2=c(10,20), y3=c(15,1250))

As an output I'd like a list returning the colnames: "x1","x2"

I tried the following, but the code works only if "ranges" contains all the numbers as specified below, and matches if the number is found. Thats unfortunately not what I need.

ranges<-c(15:300,10:20)
df.l<-colnames(df)[sapply(df,function(x) any(x %in% ranges))]

Any ideas? Thanks!


Solution

  • If 'ranges' is a data.frame or list, one option is

    names(which(unlist(Map(function(x, y) any(!(x >= y[1] & x <= y[2])), df, ranges))))
    #[1] "x1" "x2"
    

    Or use the reverse logic

    names(which(unlist(Map(function(x, y) any(x < y[1]| x > y[2]), df, ranges))))
    

    Or in tidyverse,

    library(purrr)
    library(dplyr)
    library(tibble)
    map2(df, ranges, ~ between(.x, .y[1], .y[2]) %>% `!` %>% any) %>% 
        enframe %>% 
        unnest(cols = value) %>% 
        filter(value) %>% 
        pull(name)
    #[1] "x1" "x2"
    

    data

    ranges <- data.frame(y1 = c(3, 8), y2 = c(10, 20), y3 = c(15, 1250))