I have a scenario where I am working with temporal data in Apache Arrow and am using compute functions to extract date/time components like so:
auto year = arrow::compute::CallFunction("year", {array});
auto month = arrow::compute::CallFunction("month", {array});
auto day = arrow::compute::CallFunction("day", {array});
...
While this works, I have to manage three separate Datums. I would ideally like to have one function that returns a StructArray
containing year/month/day elements, which can also scale out to more detailed time components. Is there a simply way of registering such a function with the current API?
Is there a simply way of registering such a function with the current API?
I don't think so, your use case looks too specific. On the other hand if you do that often you can implement something that would do it for you:
std::shared_ptr<arrow::Array> CallFunctions(std::vector<std::string> const& functions,
std::vector<arrow::Datum> const& args) {
std::vector<std::shared_ptr<arrow::Array>> results;
for (std::string const& function : functions) {
results.push_back(arrow::compute::CallFunction(function, args).ValueOrDie().make_array());
}
return arrow::StructArray::Make(results, functions).ValueOrDie();
}
void test() {
auto array = ....
auto structArray = CallFunctions({"year", "month", "day"}, {array});
}