Search code examples
rlistdplyrdatatabledata.table

How to Filter Data Table Rows with condition on column of Type list() in R


I have a data.table which looks something like this (showing just few columns out of many) -

Id Period Product
1000797366 2018-Q1 UG10000-WISD
1000797366 2018-Q1 NX11100, UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG12210
1000797366 2018-Q1 NX11100
1000797366 2018-Q1 NX11100

Here the column "Product" is of type list() as I've to keep it this way for some later usage.

But I am facing a problem while filtering the rows based on a condition on Product column.

What I want is to filter all rows where value of product can be any of this vector c("UG12210","UG10000-WISD") along-with other filer such as Period in c("2018-Q1").

So my output should look something like this -

Id Period Product
1000797366 2018-Q1 UG10000-WISD
1000797366 2018-Q1 NX11100, UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG10000-WISD, UG12210
1000797366 2018-Q1 UG12210

But somehow this is not happening, I tried following conditions but none worked.

data_test[Period %in% c("2018-Q1") & is.element("UG12210",Product),]

data_test[Period %in% c("2018-Q1") & Product %in% c("UG12210"),]

Any leads on how it can be achieved will be of great help. Thanks!

Below is the data using dput() for the datatable

structure(
  list(
    Id = c("1000797366", "1000797366", "1000797366", "1000797366", "1000797366", "1000797366", "1000797366"),
    Period = c("2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1", "2018-Q1"),
    Product = list("UG10000-WISD", c("NX11100", "UG10000-WISD", "UG12210"), c("UG10000-WISD", "UG12210"),
      c("UG10000-WISD", "UG12210"), "UG12210", "NX11100", "NX11100")
  ),
  row.names = c(NA,-7L),
  class = c("data.table", "data.frame"),
  .internal.selfref = < pointer:0x562f66275020 >
)

Solution

  • You can use sapply function to check if any of the values in vals is in Product for each row:

    vals = c("UG12210","UG10000-WISD")
    
    dt[Period %chin% "2018-Q1" & sapply(Product, function(v) any(vals %chin% v))]
    
    #            Id  Period                      Product
    # 1: 1000797366 2018-Q1                 UG10000-WISD
    # 2: 1000797366 2018-Q1 NX11100,UG10000-WISD,UG12210
    # 3: 1000797366 2018-Q1         UG10000-WISD,UG12210
    # 4: 1000797366 2018-Q1         UG10000-WISD,UG12210
    # 5: 1000797366 2018-Q1                      UG12210