How can I round a double precision floating point
to the value that can be stored in a 8bit floating point
?
I'm trying to do it mathematically but I have no idea how to do.
I have an x
double
number and I should find the nearest y
that I can express with n*2^b
with n
and b
integer and n
in [-128,127]
. But how can I find the best n
and b
?
I've solved with this algorithm:
function y = DoubleTo8bit( x )
s=sign(x);
x=abs(x);
if x==0
y=0;
return;
end
b=floor(log2(x)+1)-8+(s>0);
m=s*round(x/2^b);
y=m*2^b;
end