Search code examples
sparqlrdfwikidatardf4j

rdf4j construct query fails


I'm trying to execute a construct query over Wikidata using the following code snippet:

construct = "CONSTRUCT { " +
            "   ?s <http://schema.org/about> ?wikipedia ." +
            "} where { " +
            "   OPTIONAL{ " +
            "      ?wikipedia <http://schema.org/about> ?s ; <http://schema.org/inLanguage> ?language ; <http://schema.org/isPartOf> <https://en.wikipedia.org/> . " +
            "   } "+
            "   ?s ?p1 <http://www.wikidata.org/entity/Q12136> . " +
            "}";
            repo = new SPARQLRepository("https://query.wikidata.org/sparql");
            repositoryConnection = repo.getConnection();
            query = repositoryConnection.prepareGraphQuery(construct);
            rs = query.evaluate();
            while (rs.hasNext()) {
                Statement statement = rs.next();
            }

Unfortunately this results in a parse error:

WARN org.eclipse.rdf4j.rio.helpers.ParseErrorLogger - [Rio error] IRI included an unencoded space: '32' (7730, -1)
org.eclipse.rdf4j.query.QueryEvaluationException: org.eclipse.rdf4j.query.QueryEvaluationException: org.eclipse.rdf4j.rio.RDFParseException: IRI included an unencoded space: '32' [line 7730]
    at org.eclipse.rdf4j.query.impl.QueueCursor.convert(QueueCursor.java:58)
    at org.eclipse.rdf4j.query.impl.QueueCursor.convert(QueueCursor.java:22)
    at org.eclipse.rdf4j.common.iteration.QueueIteration.checkException(QueueIteration.java:165)
    at org.eclipse.rdf4j.common.iteration.QueueIteration.getNextElement(QueueIteration.java:134)
    at org.eclipse.rdf4j.common.iteration.LookAheadIteration.lookAhead(LookAheadIteration.java:81)
    at org.eclipse.rdf4j.common.iteration.LookAheadIteration.hasNext(LookAheadIteration.java:49)
    at org.eclipse.rdf4j.common.iteration.IterationWrapper.hasNext(IterationWrapper.java:63)
    at eu.qanswer.mapping.mappings.informa.Refactor.main(Refactor.java:227)

As far as I understand in Wikidata there are some uris that are not encoded correctly, i.e. a space is there. So the rdf4j parser complains. Is there a way to configure the parser in a less strict way?

Thank you D063520


Solution

  • As you discovered, the problem here is that your query times out at the server end. The error message you get from RDF4J is confusing, but the cause is that the server endpoint does not correctly communicate that there is a problem: it just creates a 200 HTTP response (so RDF4J thinks everything is OK and starts processing the response body). Halfway through the server suddenly throws an error into the response body, which then makes the RDF4J parser throw this error.