Search code examples
sparqljenaarq

Jena/Arq: Query Processing get stuck


if have a problem with the following SPARQL-query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX :<http://www.test.at/DA.owl#>
SELECT ?ModuleName ?SLAName ?SLOName ?SLOTypeName ?SLOTHV (AVG(DISTINCT ?SLTTV) AS ?AVGTrackValue)  (COUNT(DISTINCT ?SLT) AS ?SLTCOUNT) 
WHERE {
    ?Module rdf:type :ServiceModule .
    ?SLA rdf:type :ServiceLevelAgreement.
    ?SLO rdf:type :ServiceLevelObjective .
    ?SLT rdf:type :ServiceLevelTracking .
    ?Day rdf:type :Day .
    ?Holiday rdf:type :Holiday .
    ?rel1 rdf:type :RelationFact .
    ?rel2 rdf:type :RelationFact .
    ?rel3 rdf:type :RelationFact .
    ?rel4 rdf:type :RelationFact .
    ?rel5 rdf:type :RelationFact .
    ?Module :hasName ?ModuleName .
    ?SLA :hasName ?SLAName .
    ?SLO :hasName ?SLOName .
    ?SLO :hasType ?SLOType . 
    ?SLO :hasThresholdValue ?SLOTHV .
    ?SLOType :hasName ?SLOTypeName .            
    ?SLT :hasDayName ?SLTDayName .
    ?SLT :hasType ?SLTType .        
    ?SLT :hasTrackedDateTime ?trackTime .
    ?SLT :hasTrackedValue ?SLTTV .
    ?Day :hasDayName ?DayName .
    ?Holiday :hasDate ?HolidayDate .
    ?Holiday :hasStartTime ?HolidayStartTime .
    ?Holiday :hasEndTime ?HolidayEndTime .      
    ?rel1 :hasParent ?Module .
    ?rel1 :hasChild ?SLA .  
    ?rel2 :hasParent ?Module.
    ?rel2 :hasChild ?SLT .      
    ?rel3 :hasParent ?SLA .
    ?rel3 :hasChild ?SLO .
    ?rel4 :hasParent ?SLA .
    ?rel4 :hasChild ?Day .
    ?rel5 :hasParent ?SLA .
    ?rel5 :hasChild ?Holiday .  
    Filter(regex(str(?ModuleName), "E-mail")) .
    Filter(?SLOType = ?SLTType) .
    Filter(xsd:dateTime(?trackTime)  >=  xsd:dateTime("2012-08-15T12:00:00") && ?trackTime  <  xsd:dateTime("2012-08-15T13:00:00")) .
    Filter(?DayName = ?SLTDayName || (xsd:dateTime("2012-08-15T00:00:00") = ?HolidayDate && xsd:dateTime(?trackTime) >= xsd:dateTime("2012-08-15T12:00:00") &&  xsd:dateTime(?trackTime) < xsd:dateTime("2012-08-15T14:00:00")))
} 
GROUP BY ?ModuleName ?SLAName ?SLOName ?SLOTypeName ?SLOTHV
HAVING (?AVGTrackValue < ?SLOTHV)

With potégé 4.2 this query works without any problems and returns a result within 1 second. Also a syntax check with the SPARQLer Query Validator (http://www.sparql.org/query-validator.html) says that the SPARQL query is valid. But with the jena arq engine the query processing get stuck all the time while waiting for a result set. I tried it with jena-arq-2.9.1 from command line and also in a java application with follwing code:

    Query q = QueryFactory.create(queryString);
    QueryExecution qexec = QueryExecutionFactory.create(q, currentOntologyModel);
    try {
        ResultSet results = qexec.execSelect();
        while (results.hasNext()) {
            QuerySolution soln = results.nextSolution();
            .
            .
            .
            <some other code>
            .
            .
            .
            }
        }
    } finally {
        qexec.close();
    }

Maybe someone has an idea what the problem is.


Solution

  • Presumably, the data is in-memory.

    It looks like the query optimizer is not finding an efficient plan. All the unconnected rdf:type at the start are possibly causing the calculating of intermediate cross products which are inefficient.

    Reordering the triple patterns might help (mayeb move rdf:type to the end? The query on its own is not enough to know - it depends on the data). If you find a faster order,, please send it to the jena users list at Apache.