I have a struct that represents a vector. This vector consists of two one-byte integers. I use them to keep values from 0 to 255.
typedef uint8_T unsigned char;
struct Vector
{
uint8_T x;
uint8_T y;
};
Now, the main use case in my program is to multiply both elements of the vector with a 32bit float value:
typedef real32_T float;
Vector Vector::operator * ( const real32_T f ) const {
return Vector( (uint8_T)(x * f), (uint8_T)(y * f) );
};
This needs to be performed very often. Is there a way that these two multiplications can be performed simultaneously? Maybe by vectorization, SSE or similar? Or is the Visual studio compiler already doing this simultaneously?
Another usecase is to interpolate between two Vectors.
Vector Vector::interpolate(const Vector& rhs, real32_T z) const
{
return Vector(
(uint8_T)(x + z * (rhs.x - x)),
(uint8_T)(y + z * (rhs.y - y))
);
}
This already uses an optimized interpolation aproach (https://stackoverflow.com/a/4353537/871495).
But again the values of the vectors are multiplied by the same scalar value. Is there a possibility to improve the performance of these operations?
Thanks
(I am using Visual Studio 2010 with an 64bit compiler)
In my experience, Visual Studio (especially an older version like VS2010) does not do a lot of vectorization on its own. They have improved it in the newer versions, so if you can, you might see if a change of compiler speeds up your code.
Depending on the code that uses these functions and the optimization the compiler does, it may not even be the calculations that slow down your program. Function calls and cache misses may hurt a lot more.
You could try the following:
float
to uint8_t
(e.g. if your floats are in range [0,1]), try using float
everywhere. This may allow the compiler do do far better optimization.