Search code examples
c++apache-arrow

Merging Tables in Apache Arrow


I have two arrow:Tables where table 1 is:

colA        colB
1           2
3           4

and table 2 is,

colC        colD
i           j
k           l

where both table 1 and 2 have the same number of rows. I would like to join them side-by-side as

colA        colB        colC        coldD
1           2           i           j
3           4           k           l

I'm trying to use arrow::ConcatenateTables as follows, but I'm getting a bunch of nulls in my output (not shown)

t1 = ... \\ std::shared_ptr<arrow::Table>
t2 = ... \\ std::shared_ptr<arrow::Table>
arrow::ConcatenateTablesOptions options;
options.unify_schemas = true;
options.field_merge_options.promote_nullability = true;
auto merged = arrow::ConcatenateTables({t1, t2}, options);

How do I obtain the expected output?


Solution

  • arrow::ConcatenateTables only does row-wise concatenation. There is no builtin helper method for column-wise concatenation but it is easy enough to create one yourself (apologies if this is not quite right, I'm not in front of a compiler at the moment):

    std::shared_ptr<arrow::Table> CombineTables(const Table& left, const Table& right) {
      std::vector<std::shared_ptr<arrow::ChunkedArray>> columns = left.columns();
      const std::vector<std::shared_ptr<arrow::ChunkedArray>>& right_columns = right.columns();
      columns.insert(columns.end(), right_columns.begin(), right_columns.end());
    
      std::vector<std::shared_ptr<arrow::Field>> fields = left.fields();
      const std::vector<std::shared_ptr<arrow::Field>>& right_fields = right.fields();
      fields.insert(fields.end(), right_fields.begin(), right_fields.end());
    
      return arrow::Table::Make(arrow::schema(std::move(fields)), std::move(columns));
    }