Search code examples
pythonurihtml-parsingrdfontology

How to read URIs from RDFLib using Python?


I have several thousands URIRef ontology values that I'm trying to get a string representation of:

[rdflib.term.URIRef('http://purl.obolibrary.org/obo/RO_0002219'),
 rdflib.term.URIRef('http://purl.obolibrary.org/obo/RO_0002551'),
rdflib.term.URIRef('http://purl.obolibrary.org/obo/uberon/core#indirectly_supplies')]

I could go to each one's link individually (eg http://purl.obolibrary.org/obo/RO_0002219) and get it (e.g surrounded by), but how can I do it with Python? There are 2 ways that I see how to do it but I couldn't figure out either. One way would be simply to use RDFLib library, but I didn't find a function that translates the link. Another way would be to parse the HTML link to get the red value (I think that's corresponds to the translation).

Note that some of them don't have anything attached to them (eg http://purl.obolibrary.org/obo/uberon/core#indirectly_supplies is 404: Not Found)


Solution

  • Since those URIs support RDF content negotiation you can just get the rdf and load it into a graph, shown below. Once you have the graph, you can query the properties that you want out of it with SPARQL. In the example below, I fetch the label of each of your subjects. I also removed one of the URIs that you provided since it 404's.

    from rdflib import Graph, URIRef
    
    uris = [URIRef('http://purl.obolibrary.org/obo/RO_0002219'), URIRef('http://purl.obolibrary.org/obo/RO_0002551')]
    
    for uri in uris:
       query = """
       SELECT ?label WHERE {
          <"""+str(uri)+"""> rdfs:label ?label.
       }
       """
       g = Graph()
       g.parse(uri)
       res = g.query(query)
       for result in res:
          print(result)
    

    This gives an output,

    (rdflib.term.Literal('surrounded by', lang='en'),)
    (rdflib.term.Literal('has skeleton'),)