Search code examples
rlistdata-manipulation

Selecting Rows From a Table Based on A List


I have this table:

col1 <- c("1","2", "3", "4", "5")
col1 <- sample(col1, 1000, replace=TRUE, prob=c(0.2, 0.2, 0.2, 0.2, 0.2))

col2 <- c("6","7", "8")
col2 <- sample(col2, 1000, replace=TRUE, prob=c(0.2, 0.4, 0.4))

col3 <- c("9","10", "11", "12")
col3 <- sample(col3, 1000, replace=TRUE, prob=c(0.1, 0.1, 0.4, 0.4))

col4 <- rexp( 1000, 0.5)
col5 <- rexp( 1000, 0.5)
id <- 1:1000

table_1 = data.frame(id, col1, col2, col3, col4, col5)

And this list:

f <- function(set) { 
    n <- length(set)
    masks <- 2^(1:n-1)
    lapply( 1:2^n-1, function(u) set[ bitwAnd(u, masks) != 0 ] )
}

sample_list = f(min(col1):max(col3))

I want to select rows from "table_1" based on entries in "sample_list". For example:

select = as.integer(runif(1, min = 1, max = 512))

>select
381

my_select = sample_list[select]

sample_list[381]
[[1]]
[1] 3 4 5 6 7 9

Is there someway that I can "quickly" select all rows in "table_1" where (table_1$col1, table_1$col2, table_1$col3) have values that are contained in "my_select"?

This would be the equivalent of:

subset(table_1, col1 %in% c("3", "4", "5") &  col2 %in% c("6", "7") &  col3 %in% c("9"))

Thank you!


Solution

  • Not sure if you mean all of the columns should have the value from the list index or just one column.

    Here is one solution that returns the rows where all match

    my_select <- function(index){
      where<- which(apply(table_1[2:4], 1, \(x) all(x  %in% sample_list[[index]])) |> t())
      where
    }
    
    [1] 131 146 174 179 205 272 396 450 500 512 574 589 619 669 673 703 736 751 887 893 925 992