ios performance matrix-multiplication accelerate-framework

Which should I choose between vdsp_mmul or cblas_dgemm?

I'm using the Accelerate framework for the first time for a huge matrix multiplication, but I don't understand the difference between vDSP and CBLAS in this case. Are they different in performance?

Solution

vDSP and CBLAS have different histories, but have some overlap in functionality since they cover similar spaces. In general, when looking at high-performance functions, you should look for the simplest one that covers your requirements. For example, CBLAS supports switching between row-major and column-major ordering, while vDSP does not. Every option means there is some conditional inside the function, and every conditional means there is some time spent testing that conditional. So one would expect, all things being equal, the vDSP version to be faster since it probably does similar things inside, while providing fewer options. Simpler functions are also simpler to call.

That said, the way that you check performance is with tests, not by making assumptions about how something might be implemented. In some cases, hand-writing a for loop is much faster than the equivalent Accelerate functions because the compiler can optimize your loop better than the function. Again, only testing can tell you. Sometimes Accelerate can represent dramatic improvements. (For more, see http://robnapier.net/fast-bezier-intro).