Search code examples
c#apache-arrowapache-arrow-flight

Transform Arrow.Table to Arrow.RecordBatch


I'm working with some Arrow data in C# as a Table and need to convert this to RecordBatch to send over the wire via Arrow Flight. It's trivial to go the other way via Table.TableFromRecordBatches like this:

var schema = recordBatch.Schema;
var table = Table.TableFromRecordBatches(schema, new List<RecordBatch>{recordBatch});

I can't find / see a way to do the reverse. Does this exist, should it exist, is it not yet implemented?

Follow up question - should I just avoid using Table at all? It seems like most interop needs are met with RecordBatch, and maybe Table is not useful.


Solution

  • RecordBatch essentially is a Table without schema. If you observe an Apache Arrow Table, it consists of schema and records where records are RecordBatches.

    To answer your question, yes, ideally the reverse should exist. I was facing a similar problem using Apache Arrow JS and apparently the Apache Arrow community support for JS is not that good compared to pyarrow, C++ or Java.

    With that being said, I think it is possible to get a RecordBatch from Table by overriding the method that returns schema for the table as a no-op. That's what I did to solve my problem.