Search code examples
rreticulate

pyarrow compute expression in reticulate


How do I use a filter in a reticulate pyarrow compute expression.

At base I have a pyarrow dataset (in this case called woodcan) that I want to turn into a table with a filter.

tab <- woodcan$to_table(ds$field('Region')=='Canada')

The above gets Error in py_compare_impl(a, b, op) : ValueError: An Expression cannot be evaluated to python True or False. If you are using the 'and', 'or' or 'not' operators, use '&', '|' or '~' instead.

How is that syntax supposed to look?


Solution

  • You could generate the expression running python code with py_run_string or py_run_file and pass it to filter argument of to_table:

    library(reticulate)
    
    run.py <- py_run_string('
    import pyarrow.dataset as ds
    expr = ds.field("Region") == "Canada"
    ')
    
    woodcan$to_table(filter=run.py$expr)
    

    Above code needs previous installation of py_arrow in conjuction with reticulate:

    virtualenv_create("arrow-env")
    arrow::install_pyarrow("arrow-env")
    use_virtualenv("arrow-env")