Search code examples
ioskernelmetalloop-unrolling

Loop unrolling in Metal kernels


I need to force the Metal compiler to unroll a loop in my kernel compute function. So far I've tried to put #pragma unroll(num_times) before a for loop, but the compiler ignores that statement.

It seems that the compiler doesn't unroll the loops automatically — I compared execution times for 1) a code with for loop 2) the same code but with hand-unrolled loop. The hand-unrolled version was 3 times faster.

E.g.: I want to go from this:

for (int i=0; i<3; i++) {
    do_stuff();
}

to this:

do_stuff();
do_stuff();
do_stuff();

Is there even something like loop unrolling in the Metal C++ language? If yes, how can I possibly let the compiler know I want to unroll a loop?


Solution

  • Metal is a subset C++11, and you can try using template metaprogramming to unroll loops. The following compiled in metal, though I don't have time to properly test it:

    template <unsigned N> struct unroll {
    
        template<class F>
        static void call(F f) {
            f();
            unroll<N-1>::call(f);
        }
    };
    
    template <> struct unroll<0u> {
    
        template<class F>
        static void call(F f) {}
    };
    
    kernel void test() {
    
        unroll<3>::call(do_stuff);
    
    }
    

    Please let me know if it works! You'll probably have to add some arguments to call to pass arguments to do_stuff.

    See also: Self-unrolling macro loop in C/C++