Search code examples
sparqlmarklogicmarklogic-8

Is there any way to optimize SPARQL queries?


I have unmanaged triples stored as part of individual documents that I am storing in my content db. Essentially each document represent a person, and the defined triple specifies the document URI for the person's manager. I am trying to use SPARQL to determine the length of paths between a manager and all of the people below them in the hierarchy.

The triples in the document look like

<sem:triple xmlns:sem="http://marklogic.com/semantics">
    <sem:subject>http://rdf.abbvienet.com/infrastructure/person/10740024</sem:subject>
    <sem:predicate>http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager</sem:predicate>
    <sem:object>http://rdf.abbvienet.com/infrastructure/person/10206242</sem:object>
</sem:triple>

I have found the following sparql query, which can be used to return a manager, aperson below them in the hierarchy, and the number of nodes distant they are.

select  ?manager ?leaf (count(?mid) as ?distance) { 
  BIND(<http://rdf.abbvienet.com/infrastructure/person/10025613> as ?manager)
  ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
  ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+ ?manager .
}
group by ?manager ?leaf 
order by ?manager ?leaf

This works, but is very slow, even in the case where the hierarchy tree I am looking at is one or two levels deep, around 15s. I have 63,139 manager triples of this type in the db.


Solution

  • I think the biggest problem is going to be the BIND() - MarkLogic 8 doesn't optimize the pattern you're using at all well. Can you try substituting your constant into the places you use the ?manager variable to see if that makes a big difference? i.e.:

    select  ?leaf (count(?mid) as ?distance) { 
      ?leaf <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>* ?mid .
      ?mid <http://schemas.abbvienet.com/ontologies/infrastructure.owl#manager>+
        <http://rdf.abbvienet.com/infrastructure/person/10025613> .
    }
    group by ?leaf 
    order by ?leaf
    

    StackOverflow isn't a great place to answer performance questions like this, as it really needs a conversation where we work together to help you. Maybe you can try contacting support or the MarkLogic developer mailing list for this kind of question?