I have a Halide::Runtime::Buffer
and would like to remove elements that match a criteria, ideally such that the operation occurs in-place and that the function can be defined in a Halide::Generator
.
I have looked into using reductions, but it seems to me that I cannot output a vector of a different length -- I can only set certain elements to a value of my choice.
So far, the only way I got it to work was by using a extern "C"
call and passing the Buffer I wanted to filter, along with a boolean Buffer (1's and 0's as ints). I read the Buffers into vectors of another library (Armadillo), conducted my desired filter, then read the filtered vector back into Halide.
This seems quite messy and also, with this code, I'm passing a Halide::Buffer
object, and not a Halide::Runtime::Buffer
object, so I don't know how to implement this within a Halide::Generator
.
So my question is twofold:
extern "C"
functions within Generators?The first part is effectively stream compaction. It can be done in Halide, though the output size will either need to be fixed or a function of the input size (e.g. the same size as the input). One can get the max index produced as output as well to indicate how many results were produced. I wrote up a bit of an answer on how to do a prefix sum based stream compaction here: Halide: Reduction over a domain for the specific values . It is an open question how to do this most efficiently in parallel across a variety of targets and we hope to do some work on exploring that space soon.
Whether this is in-place or not depends on whether one can put everything into a single series of update definitions for a Func
. E.g. It cannot be done in-place on an input passed into a Halide filter because reductions always allocate a buffer to work on. It may be possible to do so if the input is produced inside the Generator.
Re: the second question, are you using define_extern
? This is not super well integrated with Halide::Runtime::Buffer
as the external function must be implemented with halide_buffer_t
but it is fairly straight forward to access from within a Generator. We don't have a tutorial on this yet, but there are a number of examples in the tests. E.g.:
https://github.com/halide/Halide/blob/master/test/generator/define_extern_opencl_generator.cpp#L19
and the definition:
https://github.com/halide/Halide/blob/master/test/generator/define_extern_opencl_aottest.cpp#L119
(These do not need to be extern "C"
as I implemented C++ name mangling a while back. Just set the name mangling parameter to define_extern
to NameMangling::CPlusPlus
and remove the extern "C"
from the external function's declaration. This is very useful as it gets one link time type checking on the external function, which catches a moderately frequent class of errors.)