Search code examples
javaarraysnancosine-similarity

How to: square root squares for cosine similarity within an array ~java~


My issue is that I have am creating a book recommendation system and when I try to square root the squares to determine similarity. I do not believe it is square rooting all the contents of each array.

The user is prompted with the twenty books and then inputs answers ranging from '1-5' based on how much they like the book and '-1' if they have not read the book.

A few of my score outputs are NaN. Therefore I assume it is just stopping after the first element of the array.

I have tried rearranging loops I personally think it is an issue with the loops and how it accesses the array.

Here is CPU ratings file.

-1 1 1 4 1 3 3 1 2 3 4 -1 4 1 2 4 5 4 2 3
3 -1 2 3 -1 2 5 -1 3 3 5 2 2 1 2 3 5 3 4 2
-1 1 -1 4 1 3 5 2 1 5 3 -1 5 2 1 3 4 5 3 2
-1 -1 3 2 -1 5 5 2 2 4 4 2 3 2 -1 3 4 4 3 1
2 1 1 5 2 2 4 2 3 4 3 -1 5 2 2 5 3 5 2 1
3 -1 3 4 -1 2 5 -1 -1 4 3 -1 3 -1 2 5 5 5 4 2
4 -1 4 2 3 -1 1 3 4 -1 1 4 4 4 -1 2 -1 1 4 4
4 3 3 3 -1 2 2 4 3 -1 2 4 3 4 2 -1 -1 2 2 3
3 -1 3 -1 3 4 -1 5 5 -1 -1 -1 1 -1 -1 1 1 2 -1 5
3 -1 3 4 3 4 -1 5 5 2 3 3 4 1 1 -1 -1 -1 -1 4
4 -1 4 4 1 3 -1 5 4 -1 1 3 4 1 -1 1 -1 1 -1 5
5 -1 3 1 4 3 -1 5 4 1 3 2 1 -1 4 2 1 -1 2 4
3 -1 5 1 4 4 2 5 5 1 2 3 1 1 -1 1 -1 1 -1 5
4 1 5 4 3 -1 1 3 4 -1 -1 3 3 -1 1 1 2 -1 3 5
-1 1 1 3 -1 3 1 3 -1 -1 3 -1 5 2 2 1 4 -1 5 -1
3 -1 2 3 1 5 4 3 3 -1 5 -1 5 2 -1 4 4 3 3 3
1 1 1 3 2 4 1 -1 -1 -1 5 -1 3 -1 -1 1 -1 2 5 2
-1 2 3 5 -1 4 3 1 1 3 3 -1 4 -1 -1 4 3 2 5 1
-1 1 3 3 -1 3 3 1 -1 -1 3 -1 5 -1 -1 3 1 2 4 -1
3 -1 2 4 1 4 3 -1 2 3 4 1 3 -1 2 -1 4 3 5 -1
-1 1 3 5 -1 4 2 1 -1 3 3 2 3 2 -1 3 1 -1 3 -1
3 2 2 3 -1 5 -1 -1 2 3 4 -1 4 1 -1 -1 -1 -1 4 2
-1 3 -1 -1 4 -1 2 -1 2 2 2 5 -1 3 4 -1 -1 2 -1 2
1 4 3 -1 3 2 1 -1 -1 -1 1 3 1 3 3 1 -1 -1 -1 3
4 3 3 -1 4 2 -1 4 -1 -1 2 4 -1 3 4 2 -1 -1 -1 4
-1 5 1 -1 4 1 -1 3 2 2 -1 4 1 3 3 1 -1 -1 -1 3
-1 4 2 1 5 -1 -1 2 1 1 -1 5 -1 5 4 1 2 2 -1 1
2 5 2 -1 3 -1 -1 1 -1 2 -1 4 2 4 3 -1 2 1 -1 -1
2 5 1 1 4 -1 2 1 -1 -1 2 4 -1 3 4 2 -1 -1 -1 4

method to square root the squares

        public static double sqrtSquares(double []A) {

            //check A for -1
        double sum = 0;
                for(int i = 0; i<A.length; i++) {
                    if(A[i] < 0 ) {
                        A[i] = 0;
                    }

                    A[i] = Math.sqrt(A[i]);

                    //calculate the running sum;
                    sum += A[i] * A[i] ;
                }
        return Math.sqrt(sum);
        }


    public static double similarity(double []A, double []B) {
        double sum = 0;
        double p1 = sqrtSquares(A);
        double p2 = sqrtSquares(B);

        for (int i=0; i<A.length; i++) {
            if (A[i]> 0) {
                if (B[i]> 0) {
                    sum += A[i]*B[i];
                }
            }

        }
    return sum/(p1*p2);
    }

here is the main similarity score method

        double []scores = new double[30];

        for(int i = 0; i< 30; i++) {
            scores[i] = similarity(yourrating, pplratings[i]);
        }
        for(int k = 0; k <scores.length; k++) {
            System.out.println("SCORES ["+ k + "] "+scores[k]);
        }
            return scores;
    }

In the end of the method it prints the 30 scores retrieved by both of the arrays. Here are the error results

SCORES [0] 0.8345932239467343
SCORES [1] 0.8930284538287845
SCORES [2] 0.8859571865530889
SCORES [3] 0.8885782312086968
SCORES [4] 0.8775173350115371
SCORES [5] 0.9443223415026459
SCORES [6] 0.8250453876017286
SCORES [7] 0.8432290780758503
SCORES [8] 0.8862288358972311
SCORES [9] 0.7131697319344704
SCORES [10] 0.8182594818515688
SCORES [11] 0.8009904274635006
SCORES [12] 0.8637068116707501
SCORES [13] 0.8507371827482269
SCORES [14] 0.8370334932826162
SCORES [15] 0.775738787468209
SCORES [16] 0.880315376993314
SCORES [17] 0.7702419338621114
SCORES [18] 0.841428935139835
SCORES [19] 0.7527243233023518
SCORES [20] 0.8474342113753683
SCORES [21] 0.815084547094269
SCORES [22] 0.7592956404693546
SCORES [23] 0.7303452808509205
SCORES [24] 0.7808981699861455
SCORES [25] 0.7676319325573738
SCORES [26] 0.7782147276497292
SCORES [27] 0.7962287074180334
SCORES [28] 0.7538710355467405
SCORES [29] 0.7795507063811014

EDIT: this code now works. Thank you for everyone's help.


Solution

  • From our discussion, and your explanation of the problem, the following issues were found in your code.

    1. The logic in the sqrtSquares() function was flawed. It still needs correction because you are implementing cosine similarity. The right definition is provided by @hsin1.att214. I am writing it here again, for convenience:
    public static double sqrtSquares(double []A) {
        double sum = 0;
        for(int i = 0; i<A.length; i++) {
            if(A[i] < 0 ) {
                A[i] = 0;
            }
            sum += A[i]*A[i];    // calculate the running sum of squares
        }
        return Math.sqrt(sum);   // calculate the square root of the sum of squares
    }
    
    1. The use of two return statements, one of which was inside the for loop, returns values after processing just the first element of the array. So pull the return statement outside the loop.