Search code examples
javacperformancecosine-similarity

Cosine-similarity performance in Java 15 times slower than equivalent C?


I have two functions, each of which calculates the cosine similarity of two different vectors. One is written in Java, and one in C.

In both cases I am declaring two 200 element arrays inline, and then calculating their cosine similarity 1 million times. I'm not counting the time for the jvm to startup. The Java implementation is nearly 15 times slower than the C implementation.

My questions are:

1.) Is it reasonable to assume that for tight loops of simple math c is still an order of magnitude faster than java?

2.) Is there some mistake in the java code, or some sane optimization that would dramatically speed it up?

Thanks.

C:

#include <math.h>

int main()
{
  int j;
  for (j = 0; j < 1000000; j++) {
    calc();
  }

  return 0;

}

int calc ()
{

  double a [200] = {0.269852, -0.720015, 0.942508, ...};
  double b [200] = {-1.566838, 0.813305, 1.780039, ...};

  double p = 0.0;
  double na = 0.0;
  double nb = 0.0;
  double ret = 0.0;

  int i;
  for (i = 0; i < 200; i++) {
    p += a[i] * b[i];
    na += a[i] * a[i];
    nb += b[i] * b[i];
  }

  return p / (sqrt(na) * sqrt(nb));

}

$ time ./cosine-similarity

0m2.952s

Java:

public class CosineSimilarity {

            public static void main(String[] args) {

                long startTime = System.nanoTime();

                for (int i = 0; i < 1000000; i++) {
                    calc();
                }

                long endTime = System.nanoTime();
                long duration = (endTime - startTime);

                System.out.format("took %d%n seconds", duration / 1000000000);

            }

            public static double calc() {

                double[] vectorA = new double[] {0.269852, -0.720015, 0.942508, ...};
                double[] vectorB = new double[] {-1.566838, 0.813305, 1.780039, ...};

                double dotProduct = 0.0;
                double normA = 0.0;
                double normB = 0.0;
                for (int i = 0; i < vectorA.length; i++) {
                    dotProduct += vectorA[i] * vectorB[i];
                    normA += Math.pow(vectorA[i], 2);
                    normB += Math.pow(vectorB[i], 2);
                }
                return dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));
            }
    }

$ java -cp . -server -Xms2G -Xmx2G CosineSimilarity

took 44 seconds

Edit:

Math.pow was indeed the culprit. Removing it brought the performance right on par with that of C.


Solution

  • Math.pow(a, b) does math.exp( math.log (a)*b) it's going to a very expensive way to square a number.

    I suggest you write the Java code similar to the way you wrote the C code to get a closer result.

    Note: the JVM can take a couple of seconds to warm up the code. I would run the test for longer.