I am considering vectorizing some floor() calls using sse2 intrinsics, then measuring the performance gain. But ultimately the binary is going to be run on a virtual machine which I have no access to.
I don't really know how a VM works. Is a binary entirely executed on a software-emulated virtual cpu ?
If not, supposing the VM is run on a cpu with SSE2, could the VM use his cpu SSE2 instruction when executing a SSE2 instruction from my binary ?
Could my vectorization be beneficial on the VM ?
I don't really know how a VM works. Is a binary entirely executed on a software-emulated virtual cpu?
For serious purposes, no, because it's too slow. (But e.g. Bochs does; it can be useful for kernel debugging among other things)
The binary is executed "normally" as much as possible. This generally means any code that doesn't try to interact with the OS will be executed directly. For example, system calls are likely to require the involvement of the VM implementation.
If not, supposing the VM is run on a cpu with SSE2, could the VM use his cpu SSE2 instruction when executing a SSE2 instruction from my binary?
Yes.
Could my vectorization be beneficial on the VM?
Yes.