I'm profiling my application that is OWLAPI based and the only bottleneck I found was about computing disjointness. I have to check if each class is disjoint from other classes and, if this is asserted or inferred.
It seems to be heavy to compute, because unlike for the equivalence which is based on the Node data structure (and it is efficient to retrieve data), the disjointness is based on the NodeSet in this way I'm forced to perform more loops. This is the procedure I use:
private void computeDisjointness(OWLClass clazz) {
NodeSet<OWLClass> disjointSetsFromCls = reasoner.getDisjointClasses(clazz);
for (Node<OWLClass> singleDisjoinSet : disjointSetsFromCls) {
for (OWLClass item : singleDisjoinSet) {
for (OWLDisjointClassesAxiom disjAxiom : ontology.getAxioms(AxiomType.DISJOINT_CLASSES)) {
if(disjAxiom.containsEntityInSignature(item))
{
//asserted
}
else
{
//derived
}
}
}
}
As you can see, the bottleneck is given by the 3 for loops that slow down the application; moreover, the procedure computeDisjointness is executed for each class of the ontology.
Is there a more efficient way to get the disjointness and check if the axioms are asserted or derived?
One simple optimization is to move ontology.getAxioms(AxiomType.DISJOINT_CLASSES)
to the calling method, then pass it in as a parameter. This method returns a new set on each call, with the same contents every time, since you're not modifying the ontology. So if you have N classes you are creating at least N identical sets; more if many classes are actually disjoint.
Optimization number two: check the size of the disjoint node set. Size 1 means no disjoints, so you can skip the rest of the method.
Optimization 3: keep track of the classes you've already visited. E.g., if you have
A disjointWith B
your code will be called on A
and cycle over A
and B
, then be called on B
and repeat the computation.
Keep a set of visited classes, to which you add all elements in the disjoint node set, and when it's B
turn you'll be able to skip the reasoner call as well. Speaking of which, I would assume the reasoner call is actually the most expensive call in this method. Do you have profiling data that says otherwise?
Optimization 4: I'm not convinced this code reliably tells you which disjoint axioms are inferred and which ones are asserted. You could have:
A disjointWith B
B disjointWith C
The reasoner would return {A, B, C}
in response to asking for disjoints of A
. You would find all three elements in the signature of a disjoint axiom, and find out that the reasoner has done no inferences. But the axioms in input are not the same as the axioms in output (many reasoners would in fact run absorption on the input axioms and transform them to an internal representation that is an axiom with three operands).
So, my definition of inferred and asserted would be that the set of nodes returned by the reasoner is the same as the set of operands of one disjoint axiom. To verify this condition, I would take all disjoint axioms, extract the set of operands and keep those sets in a set of sets. Then,