What is the optimal crossover point under which Strassen's Algorithm should stop the recursion and apply the multiplication, in terms of efficiency?
I know this is closely connected to the specific implementation and hardware but there should be some kind of guideline or some experimental results from someone for a general case.
Searching a bit online and asking some people they tend to think it's
n = 64;
or
n = 32;
Can anyone verify / reject these results?
This should be tuned on a per machine basis (a bit like ATLAS does). This kind of optimizations pay off for quite large matrices: if you code it yourself, and compare it to eg. a vendor BLAS implementation, then you would find a quite large n.
Memory requirements for Strassen algorithm are also something to weigh.