I have two comma delimited strings containing embeddings. Each index should be able to fit into a float and they are 128 elements long. I am following this linear algebra intro in the ojAlgo library. I'd like to convert the two strings to ojAlgo matrices, normalize them, and then compute their cosine similarity. I am testing with a single matrix first - I expect when I compute its cosine similarity it should be 1.0.
PhysicalStore.Factory<Double, Primitive32Store> storeFactory = Primitive32Store.FACTORY;
String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
.mapToDouble(Double::parseDouble)
.toArray());
System.out.println("Primitive32Store : " + matrixR032);
matrixR032.modifyAny(DataProcessors.STANDARD_SCORE);
System.out.println("Primitive32Store - normalized : " + matrixR032);
System.out.println(matrixR032);
System.out.println("matrixR032 " + matrixR032.multiply(storeFactory.make(matrixR032.transpose())));
[java] Primitive32Store : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
[java] { { 0.47058823704719543, 0.5647059082984924, 0.5490196347236633, 0.545098066329956, 0.5490196347236633 } }
[java] Primitive32Store - normalized : org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
[java] { { NaN, NaN, NaN, NaN, NaN } }
[java] org.ojalgo.matrix.store.Primitive32Store < 1 x 5 >
[java] { { NaN, NaN, NaN, NaN, NaN } }
[java] matrixR032 org.ojalgo.matrix.store.Primitive32Store < 1 x 1 >
[java] { { NaN } }
[java]
however my normalization results in NaN
AND the input numbers are given additional digits I did not specify?
update: when I switch to MatrixR064
the number no longer has seemingly random digits added to the end
Primitive32Store matrixR032 = storeFactory.rows(Arrays.stream(dummyMatrixValues.split(","))
.mapToDouble(Double::parseDouble)
.toArray());
Is a somewhat messy way to this - you don't really see what's going on. How about this way:
String dummyMatrixValues = "0.47058824,0.5647059,0.54901963,0.54509807,0.54901963";
String[] values = dummyMatrixValues.split(",");
PhysicalStore.Factory<Double, Primitive32Store> factory = Primitive32Store.FACTORY;
Primitive32Store vector = factory.make(values.length, 1);
for (int i = 0; i < values.length; i++) {
vector.set(i, 0, Double.parseDouble(values[i]));
}
vector.modifyAny(DataProcessors.STANDARD_SCORE);
double norm = vector.norm();
double dotp = vector.dot(vector);
double similarity = dotp / (norm * norm);
System.out.println("norm: " + norm);
System.out.println("dotp: " + dotp);
System.out.println("similarity: " + similarity);
I assume the "additional digits" are representation errors. The 32 in the class name Primitive32Store
indicated that it uses 32-bit float.
The DataProcessors
class assume data is stored in columns – in your case 1 columns 5 rows. You did the opposite (transposed).