In OpenCL we get an efficient hardware path for input arguments when we specify them as const global * restrict as in (for a piece of handwritten OpenCL code):
__kernel void oclConvolveGlobalMem(const global float* restrict input,
constant float* restrict filterWeights,
global float* restrict output)
However, as seen with HL_DEBUG_CODEGEN=1
Halide generates:
// Address spaces for kernel_conv_70_s0_y___block_id_y
#define __address_space__conv__70 __global
#define __address_space__input __global
#define __address_space__kernel __global
__kernel void kernel_conv_70_s0_y___block_id_y(
const int _conv__70_extent_0,
const int _conv__70_extent_1,
const int _conv__70_min_0,
const int _conv__70_min_1,
const int _conv__70_stride_1,
const int _input_min_0,
const int _input_min_1,
const int _input_stride_1,
const int _kernel_min_0,
const int _kernel_min_1,
const int _kernel_stride_1,
__address_space__conv__70 float *_conv__70,
__address_space__input const float *_input,
__address_space__kernel const float *_kernel,
__address_space___shared int16* __shared)
where the input argument is not declared restrict
. I expect this to sincerely limit performance. I do I get Halide to add the notion that the pointers are restricted (the buffer they use are not aliasing.)
When did you last update Halide? Halide recently (sort of, October 2016) added restrict to buffer arguments: https://github.com/halide/Halide/pull/1550. The latest binary release does have this change, barely.