How can I read a CSV into a DataFusion DataFrame with datafusion-python?
Here's what I have so far:
import datafusion
ctx = datafusion.SessionContext()
I couldn't find any instructions in the docs.
I am using DataFusion v0.6.0.
There is some documentation here - https://github.com/apache/arrow-datafusion/blob/master/docs/source/python/index.rst
Here is one of the examples:
import datafusion
from datafusion import functions as f
from datafusion import col
import pyarrow
# create a context
ctx = datafusion.SessionContext()
# register a CSV
ctx.register_csv('example', 'example.csv')
# create a new statement via SQL
df = ctx.sql("SELECT a+b, a-b FROM example")
# execute and collect the first (and only) batch
result = df.collect()[0]
assert result.column(0) == pyarrow.array([5, 7, 9])
assert result.column(1) == pyarrow.array([-3, -3, -3])
There is work under way to move the documentation to the datafusion-python repo (see https://github.com/apache/arrow-datafusion/issues/2866)