What is the x86 equivalent of ceil , floor ? I can't particularly google the corresponding instructions . The equivalent may not necessarily be a one instruction though one instruction would be preferred.
One-instruction floor/ceil is only available with SSE4.1 roundsd
/ roundpd
, and only for XMM not legacy x87.
With x87, you set the current rounding mode and then use frndint
. (https://masm32.com/masmcode/rayfil/tutorial/fpuchap1.htm shows where the RC bits are in the x87 control word). The available rounding modes are nearest-even (default), towards +Inf (ceil), towards -Inf (floor), and towards zero (trunc).
Don't forget to set the rounding mode back when you're done.
Apparently frndint
is slow on some CPUs (https://agner.org/optimize) so it can actually be faster to convert to integer and back with fistp
/ fild
(still with the rounding mode set appropriately). But that only works for FP values that can be represented as a signed 64-bit integer (assuming you use a qword memory operand). You can instead add / subtract an appropriate magic number (to make the value really large forcing rounding). Again, that may require setting the rounding mode.
Of course if you want (int)floor(x)
then definitely just fistp
with the rounding mode set as desired.
With SSE3, you can use fisttp
to truncate toward zero (it was added to speed up C float->int casting in code that still wants to use legacy x87 even though SSE is available).
floor == trunc
for non-negative values so you could take advantage of efficient fisttp
or SSE XMM truncation for that case.
With SSE4.1, for scalar double
in XMM registers, you'd use roundsd
to round to nearest integer-valued double
with your choice of rounding mode specified by the immediate operand. (Not a conversion, just double->double rounding like frndint
, so it works for any value). Packed and scalar single and double precision versions are available.
With SSE4.1, roundsd
+ cvtsd2si
is your best bet for (int)floor(x)
. Or if you know your value is non-negative you can just use SSE2 cvttsd2si
. Note the extra t
for "truncate". (Same for single-precision, or for packed SIMD with cvtpd2dq
.)