I am trying a semantic match between two sentences by comparing the dependencies.
I am getting two Stanford dependency trees from two different sentences. I want to compare and get a score for the semantic match between the sentences.
for(TypedDependency td1 : dependencyList1)
{
for(TypedDependency td2 : dependencyList2)
{
score = td1.compareTo(td2);
}
}
dependencyList1
and dependencyList2
are the list of all dependencies from sentences1 and sentence 2 respectively.
I am using a compareTo
function which gives out scores of -1,0,1
.
I then average out the scores to come up with a final score.
I don't know how these scores are calculated.
Is there a better way to compare and identify similar dependencies.
Any help would be appreciated.
compareTo()
gives you an ordering between dependencies, e.g., for sorting (see https://docs.oracle.com/javase/7/docs/api/java/lang/Comparable.html). To find similar dependencies, you first need to formalize exactly what you mean by "similar", and then make a custom scoring function.
A natural metric, beyond simple equality, is collapsing things like *subj
(nsubj, nsubjpass, csubj, csubjpass) and *obj
(dobj, iobj). If you care about the endpoints of the arcs, checking for lemma match rather than word match is maybe a good start. Similarity in vector space (e.g., with word2vec or GloVE) is also quite effective.
The list of dependencies, for reference, can be found at: http://universaldependencies.github.io/docs/u/dep/index.html