I have my data organised in multiple graphs. The graph in which a triple is saved matters. The data structure is complicated but it can be simplified like this:
My store contains cakes, where there's a hierarchy of different cake types, all subclasses of <cake>
<http://example.com/a1> a <http://example.com/applecake>
<http://example.com/a2> a <http://example.com/rainbowcake>
...
Depending on how they get created by a user in a UI, they end up in a different graph. If for instance the user "bakes" a cake, it goes in the <http://example.com/homemade>
graph, if they "buy" one, it goes into the <http://example.com/shopbought>
graph.
When I retrieve my cakes from the store, I want to know for each cake whether it's homemade or shopbought. There is no property for this, I want to retrieve the information purely based on the graph the triple is stored in.
I have tried various ways of achieving this but none of them work in Jena TDB. The problem is that all cakes come back as "shopbought". All of the queries however work in Fuseki (on the exact sae dataset) and I was wondering whether this is a TDB bug or if there's another way. Here are the simplified queries (without variations):
Version 1:
SELECT DISTINCT *
FROM <http://example.com/homemade>
FROM <http://example.com/shopbought>
FROM NAMED <http://example.com/homemade>
FROM NAMED <http://example.com/shopbought>
WHERE {
?cake rdf:type ?caketype .
?caketype rdfs:subClassOf* <cake>
{
GRAPH <http://example.com/homemade> { ?cake rdf:type ?typeHomemade }
} UNION {
GRAPH <http://example.com/shopbought> { ?cake rdf:type ?typeShopbought }
}
BIND(str(if(bound(?typeHomemade), true, false)) AS ?homemade)
}
Version 2:
SELECT DISTINCT *
FROM <http://example.com/homemade>
FROM <http://example.com/shopbought>
FROM NAMED <http://example.com/homemade>
FROM NAMED <http://example.com/shopbought>
WHERE {
?cake rdf:type ?caketype .
?caketype rdfs:subClassOf* <cake>
GRAPH ?g {
?cake rdf:type ?caketype .
}
BIND(STR(IF(?g=<http://example.com/homemade>, true, false)) AS ?homemade)
}
Any ideas why this works in Fuseki but not in TDB?
Edit: I'm beginning to think it has something to do with the GRAPH keyword. Here are some much simpler queries (which work in Fuseki and tdbquery) and the results I get using the Jena API:
SELECT * WHERE { GRAPH <http://example.com/homemade> { ?s ?p ?o }}
0 results
SELECT * WHERE { GRAPH ?g { ?s ?p ?o }}
0 results
SELECT * FROM <http://example.com/homemade> WHERE { ?s ?p ?o }
x results
SELECT * FROM <http://example.com/homemade> WHERE { GRAPH <http://example.com/homemade> { ?s ?p ?o }}
0 results
SELECT * FROM NAMED <http://example.com/homemade> WHERE { GRAPH <http://example.com/homemade> { ?s ?p ?o }}
0 results
OK so my solution has actually to do with the way I executed the query. My initial idea was to pre-filter the dataset so that a query only gets executed on the relevant graphs (the dataset contains many graphs and they can be quite large which would make querying "everything" slow). This can be done either by adding them to the SPARQL or directly in Jena (although this would not work for other triple stores). Combining both ways however "to be on the safe side" does not work.
This query runs on the entire dataset and works as expected:
Query query = QueryFactory.create("SELECT * WHERE { GRAPH ?g { ?s ?p ?o } }", Syntax.syntaxARQ);
QueryExecution qexec = QueryExecutionFactory.create(query, dataset);
ResultSet result = qexec.execSelect();
The same query can be executed only on a specific graph, where it doesn't matter which graph that is, it does not give any results:
//run only on one graph
Model target = dataset.getNamedModel("http://example.com/homemade");
//OR run on the union of all graphs
Model target = dataset.getNamedModel("urn:x-arq:UnionGraph");
//OR run on a union of specific graphs
Model target = ModelFactory.createUnion(dataset.getNamedModel("http://example.com/shopbought"), dataset.getNamedModel("http://example.com/homemade"), ...);
[...]
QueryExecution qexec = QueryExecutionFactory.create(query, target);
[...]
My workaround was to now always query the entire dataset (which supports the SPARQL GRAPH keyword fine) and for each query always specify the graphs on which it should run to avoid having to query the entire dataset. Not sure if this is expected behaviour for the Jena API