Search code examples
javanlpcosine-similaritydl4j

DL4J: How to calculate Cosine Similarity between INDArray obtained from getWordVectorsMean


I have calculated VectorMean of two sentences like this:

String demoString1 = "Enter first label";
String demoString2 = "Enter first name";
        Collection<String> label1 = Splitter.on(' ').splitToList(demoString1);
        Collection<String> label2 = Splitter.on(' ').splitToList(demoString2);

        System.out.println("label1:==>"+label1);
        System.out.println("getWordVectorMatrix->INDArray------------------"+vectors.getWordVectorsMean(label1));

        System.out.println("label2:==>"+label2);
        System.out.println("getWordVectorMatrix->INDArray------------------"+vectors.getWordVectorsMean(label2));

Output:

label1:==>[Enter, first, label]
getWordVectorMatrix->INDArray------------------[0.02,  -0.14,  0.07,  -0.10,.............100 dimension vector]
label2:==>[Enter, first, name]
getWordVectorMatrix->INDArray------------------[-0.00,  -0.15,  0.07,  -0.13,............100 dimension vector]

Now how I can calculate similarity(Cosine Similarity) between both sentences using their mean ? I searched, but I couldn't find any API available in DL4J.


Solution

  • Method:

    public static double cosineSimForSentence(Word2Vec vector, String sentence1, String sentence2){
            Collection<String> label1 = Splitter.on(' ').splitToList(sentence1);
            Collection<String> label2 = Splitter.on(' ').splitToList(sentence2);
            try{
                return Transforms.cosineSim(vector.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
            }catch(Exception e){
                exceptionMessage = e.getMessage();
            }
            return Transforms.cosineSim(vector.getWordVectorsMean(label1), vector.getWordVectorsMean(label2));
    
        }
    

    Method call:

    System.out.println("Similarity Score between: "+demoString1+" --vs-- "+ demoString2 +":==>"+ cosineSimForSentence(vectors, demoString1, demoString2));