I'm doing some work trying to recommend documents, and to do so I am using the Cosine Similarity method. Here is the code for that method:
static double cosineSimilarity(HashMap<String, Double> v1, HashMap<String, Double> v2)
{
Set<String> both = v1.keySet();
both.retainAll(v2.keySet());
double sclar = 0, norm1 = 0, norm2 = 0;
for (String k : both)
{
sclar += v1.get(k) * v2.get(k);
}
for (String k : v1.keySet())
{
norm1 += v1.get(k) * v1.get(k);
}
for (String k : v2.keySet())
{
norm2 += v2.get(k) * v2.get(k);
}
return sclar / Math.sqrt(norm1 * norm2);
}
The problem is that the outcome varies depending on the order that the parameters are passed. For example if I call cosineSimilarity(v1, v2)
it will return 0.3
but if I call cosineSimilarity(v2, v1)
it will return a completely different value.
I figure this has something to do with the fact that Map.keySet()
returns a set backed by the map, but I do not fully understand the implications of this.
Can anyone see where the method is going wrong?
Try
Set<String> both = new HashSet<String>(v1.keySet());
instead of
Set<String> both = v1.keySet();
You shouldn't modify set you got from keySet
method because map uses it and when you remove elements from it (by retainAll
in your case) elements also removed from map. Example:
Map<Integer, Integer> mp = new HashMap<Integer, Integer>();
mp.put(1, 1);
System.out.println(mp); // output {1=1}
mp.keySet().remove(1);
System.out.println(mp); // output {}