This is my processor: 2.3 GHz Intel Core i7 (so 4 cores with hyper threading) on MacOs Sierra
This is my program:
package nn
import org.nd4j.linalg.api.ndarray.INDArray
import org.nd4j.linalg.factory.Nd4j.randn
import org.nd4j.linalg.ops.transforms.Transforms._
import org.nd4s.Implicits._
object PerfTest extends App {
val topology = List(784, 30, 10)
val biases: List[INDArray] =
topology.tail.map(size => randn(size, 1))
val weights: List[INDArray] =
topology.sliding(2).map(t => randn(t(1), t.head)) toList
(1 to 100000).foreach { i =>
val x = randn(784, 1)
biases.zip(weights).foldLeft(List(x)) {
case (as, (b, w)) =>
val z = (w dot as.last) + b
val a = sigmoid(z)
as :+ a
}
}
}
When I run above program with the default threading (for nd4j and this processor this will be 4), it takes around 28 seconds.
When I run it on 1 core (export OMP_NUM_THREADS=1
), then it takes 18 seconds.
Any idea why this is ? Thank you.
I wasn't able to find a clarification for this. So I migrated to Breeze, which is like 6 times faster, without much ado.