I have a parquet file which I have created using the python polars package. It has a single column of variable length strings that looks like:
┌──────────┐
│ str_list │
│ --- │
│ str │
╞══════════╡
│ ALV5 │
│ SMGWX │
│ NEGOT │
│ S2U0S │
│ … │
│ KFO │
│ LJ3J │
│ PCY6O │
│ GQ0W7 │
└──────────┘
I try to read this file using C++ into string variables but I am not sure what I should cast it to since the type turns out to be LARGE_STRING:
assert(record_batch->column(0)->type_id() == arrow::Type::LARGE_STRING)
is true.
I can do
auto strlist = std::static_pointer_cast<arrow::LargeStringArray>(record_batch->column(0));
but I cannot find any member function of strlist that lets me copy the strings to my own string variable.
You can use LargeStringArray::GetView(i)
to get a std::string_view
so you don't have to heap-allocate a std::string
for every string in the array.