I’ve compiled a pixel shader that uses D3DCOLORtoUBYTE4 intrinsic, then decompiled. Here’s what I found:
r0.xyzw = float4(255.001953,255.001953,255.001953,255.001953) * r0.zyxw;
o0.xyzw = (int4)r0.xyzw;
The rgba->bgra swizzle is expected but why does it use 255.001953 instead of 255.0? Data Conversion Rules is quite specific about what should happen, it says following:
Convert from float scale to integer scale: c = c * (2^n-1).
The short answer is: This is just part of the definition of the intrinsic. It's implemented that way in all the modern versions of the HLSL compiler.
255.0 as a 32-bit float is represented in binary as
0100`0011`0111`1111`0000`0000`0000`0000
255.001953 as a 32-bit float is actually represented as 255.001953125 which in binary is:
0100`0011`0111`1111`0000`0000`1000`0000
This slight bias helps in specific cases, such as the input value being 0.999999. If we used 255.0, you'd get 254. With 255.001953 you get 255. Otherwise in most other cases the answer after converting to integer (using truncation) results in the same answer either way.
Some useful and interesting musings on floating-point numbers here