In java there is an API called VectorApi. It makes possible to do arithmetical operations on a whole float[] array in a single cpu cycle.
For Example:
FloatVector fv = FloatVector.fromArray(FloatVector.SPECIES_PREFERRED, new float[]{1, 2, 3, 4, 5, 6, 7}, 0);
//multiplies the wohle array in a single cycle by 2 (if the CPU supports this)
fv.mul(2f);
now I would like to calculate the result of 1f / FloatVector. For now I do it by
fv.pow(-1f);
I assume this could be a slow operation. Is there a better way to do this?
I got this code to run on an Intel architecture Windows Laptop (jshell, so no class)
import jdk.incubator.vector.FloatVector;
import jdk.incubator.vector.VectorSpecies;
VectorSpecies SPECIES = FloatVector.SPECIES_256;
FloatVector ONE = FloatVector.zero(SPECIES).add(1f);
FloatVector fv = FloatVector.fromArray(SPECIES, new float[]{1, 2, 3, 4, 5, 6, 7, 8}, 0);
fv.pow(-1f);
ONE.div(fv); // gives the same result as the above pow operation
I did not do any performance measurements, as they are probably also platform dependent, but as you can define ONE
as constant and don't have to consider construction and addition as time consuming operations, you could do that yourself to find out if ONE.div(fv)
performs better than fv.pow(-1f);