How do I use a filter in a reticulate pyarrow compute expression.
At base I have a pyarrow dataset (in this case called woodcan
) that I want to turn into a table with a filter.
tab <- woodcan$to_table(ds$field('Region')=='Canada')
The above gets Error in py_compare_impl(a, b, op) : ValueError: An Expression cannot be evaluated to python True or False. If you are using the 'and', 'or' or 'not' operators, use '&', '|' or '~' instead.
How is that syntax supposed to look?
You could generate the expression running python code with py_run_string
or py_run_file
and pass it to filter
argument of to_table
:
library(reticulate)
run.py <- py_run_string('
import pyarrow.dataset as ds
expr = ds.field("Region") == "Canada"
')
woodcan$to_table(filter=run.py$expr)
Above code needs previous installation of py_arrow
in conjuction with reticulate
:
virtualenv_create("arrow-env")
arrow::install_pyarrow("arrow-env")
use_virtualenv("arrow-env")