Let's say that I have two instances of wmma::fragment<wmma::accumulator, 16, 16, 16, half> a, b;
(namely a
and b
). How would I go about conducting an element-wise addition of a
and b
and storing the result back into a
?
wmma fragments are actually stored in registers of the threads of a warp. So operations can be done, if each thread knows, what to do.
Scientists at the Tokyo Institute of Technology have developed a C++ library wmma_extension to (among other functions like recovering FP32 accuracy from TF32 tensor core operations) easily do arithmetic operations on wmma fragments.
The library can be found here: https://github.com/wmmae/wmma_extension
Doing arithmetic operations as a simple one-liner (plus the include) is shown here: https://github.com/wmmae/wmma_extension/blob/main/docs/ops.md
The scientists have released two related papers in 2023: