c++optimization compiler-optimization hardware

performance of multiplying 2 identical/nonidentical matrix

I am carrying out some performance test on scientific application and trying to take into account all elements that can affect performance of application(like cache size hierarchy cpu speed ... cache line and what ever can be involve with performance). This question comes to my mind although it might be stupid one but i would like to make it obvious to me.

*Question:*

if I am not right correct me please.cost of processing int and float or double value is different on processor and that is because of using CPU floating point unit (to calculate floating point values) . Now I want to know if there is difference between filling two 2d matrix with the same float or double value and multiply them or fill them with random float or double value and then multiplying them. Dose compiler use cacheing for matrix which all elements have the same values?.

Altogether processing processing floating value like (A.B) in which A and B can be numbers with different size in digits if size of A and B have any impact on processing time (for example multiplication) or not? and if there is a difference dose it important to consider it or not? . I am able to measure performance of my application using performance counter library but because of the overhead of used library you can not say for sure that Instruction/flops variation is for random value or other parameter like I/Dcache miss, cache size, problem size or other parameters.

used machine intel E4500. compiler g++ 4.7.

Thanks

Solution

You are right that integer and floating point arithmetic costs are different but not as much as one could assume. It highly depends on which processor unit is used for the computation. Particularly for Intel processors, you can find helpful information in the "Optimization Reference Manual" available at http://www.intel.com/products/processor/manuals/. Appendix C lists instruction latencies for all instructions.

To your specific question, if computation time for matrix multiplication is depending on whether the entries of the two matrices contain identical or random values, the answer is "no". If you look at the amount and sequence of instructions as well as the memory access pattern at computation runtime, it is all the same in both cases. The compiler usually also can't take any advantage of the fact that the matrices are all made up of the same entries because matrix multiplication needs to cover all possible cases. (Ok, unless you pack everything - filling the matrix entries and the multiplication itself - in one function and rule out all side effects like aliasing, then a very very smart compiler could probably make something out of it, but we are not talking about that, right?)

Also, the size in digits (I assume you are referring to decimal digits) does not matter. Every matrix entry is represented by all of its 32 bits in case of single precision floating point numbers (or 64 bits in case of double precision).