Search code examples
pythonsparqlrdflibstardognamed-graphs

What does the "identifier" in "Graph" do?


I try to query a database like this:

from rdflib import Graph, Literal, URIRef
from rdflib.namespace import RDF, SKOS
from rdflib.plugins.stores import sparqlstore


# define endpoint according to https://www.stardog.com/docs/
endpoint = 'http://path/to/query'  # http://<server>:<port>/{db}/query

# create store
store = sparqlstore.SPARQLUpdateStore()

# I only want to query
store.open(endpoint)
store.setCredentials('me', 'my_pw')

# What does this actually do? That runs through
default_graph = URIRef('some:stuff')
ng = Graph(store, identifier=default_graph)
# # If identifier is not defined, it crashes
# ng = Graph(store)

rq = """
SELECT ?foo ?bar 
WHERE {
  ?something a <http://path/to/data/.ttl#SomeValues>.
  ?something <http://path/to/data/.ttl#foo> ?foo.
  ?something <http://path/to/data/.ttl#bar> ?bar.                       
}
"""

query_res = ng.query(rq)
for s, l in query_res:
    print(s, l)

Unfortunately, I don't get any results at the moment:

<head><variable name="foo"></variable><variable name="bar"></variable></head><results></results></sparql>

My question is, what the identifier in Graph is doing i.e. whether this is important and if so, how it should be defined. When I do not define it, the code crashes with:

Response: b'{"message":"No separator character found in the URI: N53e412e0f3a74d6eab7ed6da163463bf"}'

If I put in anything else that has a colon, or slash in it, it runs through (but the query still does not return anything).

Could anyone briefly explain, what one should put in there and whether this might be the cause for the unsuccessful query (the query command itself is correct; when I call it from another tool, it works fine)?


Solution

  • The identifier argument of the Graph constructor allows to identify an RDFLib graph. If the value is None, then blank node is used as an identifier.

    However, if the store value is a SPARQLUpdateStore, then the identifier value is also used as default-graph-uri of the SPARQL Protocol, and hence can not be a blank node.

    Thus, the problem is: what is the name of the default "unnamed" graph in a remote triplestore?

    From Stardog's documentation:

    Naming

    Stardog includes aliases for several commonly used sets of named graphs. These non-standard extensions are provided for convenience and can be used wherever named graph IRIs are expected. This includes SPARQL queries & updates, property graph operations and configuration values. Following is a list of special named graph IRIs.

              Named Graph IRI                             Refers to                
    --------------------------------  ---------------------------------------------
    tag:stardog:api:context:default   the default (no) context graph              
    tag:stardog:api:context:all       all contexts, including the default graph    
    tag:stardog:api:context:named     all named graphs, excluding the default graph
    

    I can't find any public of private Stardog endpoint (it seems that ABS's endpoint is down). Example on DBpedia:

    from rdflib import Graph, URIRef
    from rdflib.plugins.stores import sparqlstore
    
    store = sparqlstore.SPARQLUpdateStore()
    store.open('http://dbpedia.org/sparql')
    
    default_graph = URIRef('http://people.aifb.kit.edu/ath/#DBpedia_PageRank') 
    ng = Graph(store, identifier=default_graph)
    
    rq = """
        SELECT ?foo ?foobar {
          ?foo ?foobar ?bar                       
        } LIMIT 100
    """
    
    query_res = ng.query(rq)
    for s, l in query_res:
        print(s, l)
    

    The results are similar to what they should be. Even in your code, the name of unnamed graph is the only problem, the results obtained are correct SPARQL XML results.


    P.S. Possibly you could try instead of for your purpose.