I am unable to perform a standard in
operation with a pre-defined list of items. I am looking to do something like this:
# Construct a simple example frame
from datatable import *
df = Frame(V1=['A','B','C','D'], V2=[1,2,3,4])
# Filter frame to a list of items (THIS DOES NOT WORK)
items = ['A','B']
df[f.V1 in items,:]
This example results in the error:
TypeError: A boolean value cannot be used as a row selector
Unfortunately, there doesn't appear to be a built-in object for in
operations. I would like to use something like the %in%
operator that is native to the R language. Is there any method for accomplishing this in python?
I can take this approach with the use of multiple 'equals' operators, but this is inconvenient when you want to consider a large number of items:
df[(f.V1 == 'A') | (f.V1 == 'B'),:]
datatable 0.10.1
python 3.6
You could also try this out:
First import all the necessary packages as,
import datatable as dt
from datatable import by,f,count
import functools
import operator
Create a sample datatable:
DT = dt.Frame(V1=['A','B','C','D','E','B','A'], V2=[1,2,3,4,5,6,7])
Make a list of values to be filtered among the observations, in your case it is
sel_obs = ['A','B']
Now create a filter expression using funtools and operators modules,
filter_rows = functools.reduce(operator.or_,(f.V1==obs for obs in sel_obs))
Finally apply the above created filter on datatable
DT[fil_rows,:]
its output as-
Out[6]:
| V1 V2
-- + -- --
0 | A 1
1 | B 2
2 | B 6
3 | A 7
[4 rows x 2 columns]
You can just play around with operators to do different type of filterings.
@sammyweemy's solution should also work.