Search code examples
sparqlmarklogic

SPARQL Geospatial Queries (MarkLogic)


Carrying on from a previous question here. Where it was noted that avoiding fn:doc() should be avoided in SPARQL queries. However, for geospatial queries aside from the code shown below I am unable to find an alternative solution. I have also used this query and it's runtime is really slow. For bigger set of data it will hit the 1 hour timeout.

Hence, I would like to ask if there is a better way in implementing Geospatial queries for SPARQL? Is it possible to use GEOSPARQL with PREFIX spatial:<http://jena.apache.org/spatial#>?

xquery version "1.0-ml";
import module namespace sem = "http://marklogic.com/semantics" at "/MarkLogic/semantics.xqy";
import module namespace thsr="http://marklogic.com/xdmp/thesaurus" 
                             at "/MarkLogic/thesaurus.xqy";

let $query := sem:sparql(
'
PREFIX xs: <http://www.w3.org/2001/XMLSchema#>
PREFIX cts: <http://marklogic.com/cts#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema/>
PREFIX fn: <http://www.w3.org/2005/xpath-functions#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX db: <http://dbpedia.org/resource/>
PREFIX onto: <http://dbpedia.org/ontology/>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns>
PREFIX xdmp: <http://marklogic.com/xdmp#>

SELECT *
WHERE{
?people </posted> ?question .
FILTER (cts:contains(fn:doc(?people), 
cts:path-geospatial-query("/people_data/location",  cts:circle(10, cts:point(59,28)))
)) .
}',
(),
(),
()
)

return (xdmp:elapsed-time())

=======Update========

Question brought over to thread


Solution

  • I see two options here:

    • either you use the geospatial function that are built into MarkLogic to find geospatial overlap directly from inside SPARQL, preferably comparing an RDF property, rather than a value from a path index (still sub-optimal)
    • better: pre-fetch a list of documents matching your geospatial constraint, and feed that as constraint into your SPARQL (this should be highly performant)

    Something along the lines of:

    let $uris := cts:uris((), (), cts:path-geospatial-query("/people_data/location",  cts:circle(10, cts:point(59,28))))
    return sem:sparql('
      SELECT *
      WHERE{
        ?person </posted> ?question .
        FILTER (?person = ?people) .
      }
    ', map:entry("people", $uris))
    

    A slightly more convenient, and better optimized of above example would be to rewrite it using Optic API. It is designed specifically for providing a highly performant way of bridging the gap between the various data models.

    Extrapolating on the above code, I think it would read something like this in optic code:

    import module namespace op="http://marklogic.com/optic" at "/MarkLogic/optic.xqy";
    
    let $people := op:from-lexicons(
      map:entry("people", cts:uri-reference()),
      "lexicon"
    )
      => op:where(
        cts:path-geospatial-query("/people_data/location", cts:circle(10, cts:point(59,28)))
      )
    
    let $questions := op:from-sparql('SELECT * WHERE { ?person </posted> ?question. }', "sparql")
    
    return $people
      => op:join-inner(
        $questions,
        op:on(
          op:view-col("lexicon", "people"),
          op:view-col("sparql", "person")
        )
      )
      =>op:result()
    

    It is a bit hard to test it without proper data and indexes, but I hope it is enough to get you started.

    You can find introductory documentation on it here:

    https://docs.marklogic.com/guide/app-dev/OpticAPI
    

    And the API reference can be found here:

    https://docs.marklogic.com/op
    

    HTH!