Search code examples
sparqlgraphdbrdflibsparqlwrapper

SPARQL Namespace conflict while querying


I'm quite puzzled by the behavior of my endpoint, nor by the processing of the request. The basic RDFS namespace seems to clash with another definition while querying, resulting in an error when declaring the prefix and a normal output when omitting the prefix in the body.

Setup

Query 1:

SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 1:

INFO:root:                                             sub                   p         
0  http://example.org/triples/17bbab96          Pont d Iéna-9423efbc
1  http://example.org/triples/37d3fba1          Pont d Iéna-9423efbc
2  http://example.org/triples/e8a8921a          Pont Transbordeur-fb62b01e
3  http://example.org/triples/7907d1de          Pont Transbordeur-fb62b01e
4  http://example.org/triples/5b529b5e          Pont d Iéna-98cdd2fc

Query 2:

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
SELECT *
WHERE {
  ?sub rdfs:label ?p .
} LIMIT 5

Output 2 (Client Side):

(...)
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
(...)
ValueError: You did something wrong formulating either the URI or your SPARQL query

Output 2 (Server Side):

[INFO ] 2023-07-13 08:50:13,797 [repositories/astra1 | c.o.f.s.GraphDBProtocolExceptionResolver] X-Request-Id: 712a09f4-626e-5f2a-b22b-5d436e2c4ae2 Client sent bad request (400)
org.eclipse.rdf4j.http.server.ClientHTTPException: MALFORMED QUERY: Multiple prefix declarations for prefix 'rdfs'

Querying using rdflib on graphdb endpoint (rdf4j)

import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)

from dotenv import load_dotenv
load_dotenv()

from rdflib import Graph
from rdflib.plugins.stores import sparqlstore
from rdflib.plugins.sparql.processor import SPARQLResult

from requests.auth import HTTPDigestAuth

from pandas import DataFrame

def sparql_results_to_df(results: SPARQLResult) -> DataFrame:
    """
    Export results from an rdflib SPARQL query into a `pandas.DataFrame`,
    using Python types. See https://github.com/RDFLib/rdflib/issues/1179.
    """
    return DataFrame(
        data=([None if x is None else x.toPython() for x in row] for row in results),
        columns=[str(x) for x in results.vars],
    )


if __name__ == '__main__':
  store = sparqlstore.SPARQLUpdateStore(query_endpoint=os.environ['SPARQL_ENDPOINT_QUERY'], update_endpoint=os.environ['SPARQL_ENDPOINT_UPDATE']) #,
                          # auth=HTTPDigestAuth(config.AUTH_USER, config.AUTH_PASS), context_aware=True,

  g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI']) # namespace_manager=None

  q_sa ="""
    select * where { 
      ?s ?p ?o .
    } limit 20 
    """
  
  q_sa2 = """
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    SELECT *
    WHERE {
      ?sub rdfs:label ?p .
    } LIMIT 20
  """
  
  qr = g.query(q_sa2)
  df = sparql_results_to_df(qr)
  logging.info(df)
  

Expectation

I'd have expected the opposite, the Query 1 failing while raising "Undefined Prefix Error" and Query 2 retrieving my results. Is there a way to have such a behavior by modifying something client side or server side ? Is this a bad idea ? (I prefer to have everything in the queries, even the most basic namespaces)

I'd be glad to read your thought on that. Thanks in advance for your answers !


Solution

  • Thanks @UninformedUser, you've put me on the right track ! Hard to figure where the error fired (rdflib's graph ? sparqlstore ? endpoint config ?)

    Alas, empty initNs doesn't work as in the source it is overriden with the default graph namespace : initNs = initNs or dict(self.namespaces()) # noqa: N806

    Looking at Namespace bindings from RDFLIB docs, each graph is shipped with default namespaces.

    Then, solution is to override default graph config : g = Graph(store=store, identifier=os.environ['SPARQL_DEFAULT_NAMED_GRAPH_FULL_URI'], bind_namespaces="none")

    Solved ! (will mark it in 2 days)