Search code examples
xmlpython-3.6rdfontologyrdflib

Is there a way to quickly access all annotations and sub-annotations from an OWL (RDF/XML) file?


So I have an ontology I've built in Protege which has annotations and sub-annotations. What I mean by that is that a concept might have a definition and that definition might have a comment.

So you might have something like (s,p,o):

'http://purl.fakeiri.org/ONTO/1111' --> 'label' --> 'Term'

'Term' --> 'comment' --> 'Comment about term.'

I am trying to make the ontology easily explorable using a Flask app (I'm using Python to parse the ontology file), and I can't seem to quickly get all of the annotations and sub-annotations.

I started using the owlready2 package but it requires you to self-define each individual annotation property (you can't just get a list of all of them, so if you add a property like random_identifier you have to go back into the code and add entity.random_identifier or it won't be picked up). This works okay, it's pretty fast, but subannotations require loading the IRI, then searching for it as:

random_prop = IRIS['http://schema.org/fillerName']
sub_annotation = x[entity, random_prop, annotation_label]

This is extremely slow, taking 5-10 minutes to load to search through around 140 sub-annotation types, compared to about 3-5 seconds for just the annotations.

From there I decided to scrap owlready2 and try rdflib. However, it looks like sub-annotations are just attached as BNodes and I can't figure out how to access them through their "parent" annotation or if that's even possible.

TL;DR: Does anybody know how to access an entry and gather all of its annotations and sub-annotations quickly in an XML/RDF ontology file?

EDIT 1:

As suggested, here is a snippet of the ontology:

    <!-- http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42610 -->

    <owl:Class rdf:about="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42610">
        <rdfs:subClassOf rdf:resource="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42698"/>
        <obo:IAO_0000115 xml:lang="en">A shortened form of a word or phrase.</obo:IAO_0000115>
        <oboInOwl:hasDbXref rdf:datatype="http://www.w3.org/2001/XMLSchema#anyURI">https://en.wikipedia.org/wiki/Abbreviation</oboInOwl:hasDbXref>
        <rdfs:label xml:lang="en">abbreviation</rdfs:label>
        <schema:alternateName xml:lang="en">abbreviations</schema:alternateName>
        <Property:P1036 rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">411</Property:P1036>
    </owl:Class>
    <owl:Axiom>
        <owl:annotatedSource rdf:resource="http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42610"/>
        <owl:annotatedProperty rdf:resource="https://www.wikidata.org/wiki/Property:P1036"/>
        <owl:annotatedTarget rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">411</owl:annotatedTarget>
        <schema:bookEdition rdf:datatype="http://www.w3.org/2001/XMLSchema#integer">20</schema:bookEdition>
    </owl:Axiom>

Thank you all so much!


Solution

  • So I was overlooking the obvious... I updated owlready2 from 0.18 to 0.22 and it's lightning fast now.