Search code examples
namespacesowlxml-namespacesontologyrdflib

Parsing an ontology in Python using rdflib and Google Collab


I have an ontology in an 'owl' file (quran_data_full.owl) and I saved it in a folder in my google drive (Quran Corpus). I want now to perform some queries in this ontology and this is my first time using the ontology file. So I searched on the web and tried to write this code:

import logging
import rdflib
import time

logging.basicConfig()
logger = logging.getLogger('logger')
logger.warning('The system may break down')

start_time = time.time()

g = rdflib.Graph()
g.parse ('/gdrive/MyDrive/Quran Corpus/quran_data_full.owl', format='application/rdf+xml')
quran_data_full = rdflib.Namespace('/gdrive/MyDrive/Quran Corpus/')
g.bind('quran_data_full', quran_data_full)
query = """
SELECT Distinct ?verse ?text  ?word1 ?wordText1 
WHERE { 
?verse rdf:type qur:Verse.
?verse rdfs:label ?textSimple.
?verse qur:DisplayText ?text.
?word1 qur:IsPartOf ?verse.
?word1 rdfs:label ?wordText1.
?word1 qur:WordLemma ?wordLema1.
?word1 qur:WordRoot ?wordRoot1.
FILTER(?wordText1 = "الوصية"@ar || ?wordLema1 = "الوصية"@ar || ?wordRoot1 = "الوصية"@ar)
}
LIMIT 25
        """
result = g.query(query)
print(result.serialize(format='csv'))

print("--- %s seconds ---" % (time.time() - start_time)) 

This gave me this error: Exception: Unknown namespace prefix : qur

I don't know the naespace should be what. The quran_data_full.owl start with these tags:

<!DOCTYPE rdf:RDF [
    <!ENTITY dcterms "http://purl.org/dc/terms/" >
    <!ENTITY foaf "http://xmlns.com/foaf/0.1/" >
    <!ENTITY owl "http://www.w3.org/2002/07/owl#" >
    <!ENTITY swrl "http://www.w3.org/2003/11/swrl#" >
    <!ENTITY swrlb "http://www.w3.org/2003/11/swrlb#" >
    <!ENTITY xsd "http://www.w3.org/2001/XMLSchema#" >
    <!ENTITY qur "http://quranontology.com/Resource/" >
    <!ENTITY skos "http://www.w3.org/2004/02/skos/core#" >
    <!ENTITY rdfs "http://www.w3.org/2000/01/rdf-schema#" >
    <!ENTITY ace_lexicon "http://attempto.ifi.uzh.ch/ace_lexicon#" >
    <!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#" >
    <!ENTITY protege "http://protege.stanford.edu/plugins/owl/protege#" >
    <!ENTITY xsp "http://www.owl-ontologies.com/2005/08/07/xsp.owl#" >
]>



<rdf:RDF xmlns="http://quranontology.com/Resource/"
     xml:base="http://quranontology.com/Resource/"
     xmlns:protege="http://protege.stanford.edu/plugins/owl/protege#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:xsp="http://www.owl-ontologies.com/2005/08/07/xsp.owl#"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:swrl="http://www.w3.org/2003/11/swrl#"
     xmlns:qur="http://quranontology.com/Resource/"
     xmlns:owl="http://www.w3.org/2002/07/owl#"
     xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
     xmlns:swrlb="http://www.w3.org/2003/11/swrlb#"
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:skos="http://www.w3.org/2004/02/skos/core#">
    <owl:Ontology rdf:about="http://quranontology.com/Resource/">
        <dcterms:contributor rdf:datatype="&xsd;string">Aimad Hakkoum</dcterms:contributor>
        <dcterms:description xml:lang="en">The Quran ontologies provides elements to describe the content of the quran.</dcterms:description>
        <dcterms:title xml:lang="en">The Quran ontology</dcterms:title>
        <rdfs:comment xml:lang="en">version 1.0 :
            Concept definded: Chapter, Verse, Word, Pronoun reference, topic , Location, Living Creation.
      </rdfs:comment>
        <dcterms:creator rdf:resource="http://www.researchgate.net/profile/Aimad_Hakkoum2"/>
    </owl:Ontology>

You can see the full content of quran_data_full.owl file here.


Solution

  • I just figure out the cause of the error. I should add the prefix in the query.

    import logging
    import rdflib
    import time
    
    logging.basicConfig()
    logger = logging.getLogger('logger')
    logger.warning('The system may break down')
    
    start_time = time.time()
    
    g = rdflib.Graph()
    g.parse ('/gdrive/MyDrive/Quran Corpus/quran_data_full.owl', format='application/rdf+xml')
    quran_data_full = rdflib.Namespace('/gdrive/MyDrive/Quran Corpus/')
    g.bind('quran_data_full', quran_data_full)
    query = """
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX qur: <http://quranontology.com/Resource/>
    SELECT Distinct ?verse ?text  ?word1 ?wordText1 
    WHERE { 
    ?verse rdf:type qur:Verse.
    ?verse rdfs:label ?textSimple.
    ?verse qur:DisplayText ?text.
    ?word1 qur:IsPartOf ?verse.
    ?word1 rdfs:label ?wordText1.
    ?word1 qur:WordLemma ?wordLema1.
    ?word1 qur:WordRoot ?wordRoot1.
    FILTER(?wordText1 = "الوصية"@ar || ?wordLema1 = "الوصية"@ar || ?wordRoot1 = "الوصية"@ar)
    }
    LIMIT 25
            """
    result = g.query(query)
    print(result.serialize(format='csv'))
    
    print("--- %s seconds ---" % (time.time() - start_time))