Search code examples
c++apache-arrow

How to return a StructArray from Multiple Scalar Functions


I have a scenario where I am working with temporal data in Apache Arrow and am using compute functions to extract date/time components like so:

auto year = arrow::compute::CallFunction("year", {array});
auto month = arrow::compute::CallFunction("month", {array});
auto day = arrow::compute::CallFunction("day", {array});
...

While this works, I have to manage three separate Datums. I would ideally like to have one function that returns a StructArray containing year/month/day elements, which can also scale out to more detailed time components. Is there a simply way of registering such a function with the current API?


Solution

  • Is there a simply way of registering such a function with the current API?

    I don't think so, your use case looks too specific. On the other hand if you do that often you can implement something that would do it for you:

    
    std::shared_ptr<arrow::Array> CallFunctions(std::vector<std::string> const& functions,
                                                std::vector<arrow::Datum> const& args) {
    
      std::vector<std::shared_ptr<arrow::Array>> results;
      for (std::string const& function : functions) {
        results.push_back(arrow::compute::CallFunction(function, args).ValueOrDie().make_array());
      }
      return arrow::StructArray::Make(results, functions).ValueOrDie();
    }
    
    void test()  {
       auto array = ....
       auto structArray = CallFunctions({"year", "month", "day"}, {array});
    
    }