I'm looking at an application that does not require 32bit of precision, 12-16bit will be enough.
Is thrust capable of dealing with float16's (i.e. packing/unpacking two 16bit precision floats into a 32bit word)?
Should I use fixed-point arithmetic?
CUDA hardware does not include native support for half
-precision arithmetic, only conversion to and from float
.
Since C does not have a built-in half
type, the conversion intrinsics use unsigned short
:
unsigned short __float2half_rn( float );
float __half2float( unsigned short );