Search code examples
sparqlmarklogicmarklogic-8marklogic-7

How to use aggregate function like SUM in marklogic sparql query with triples?


I have following triples :

<?xml  version="1.0" encoding="UTF-8"?>
<sem:triples xmlns:sem="http://marklogic.com/semantics">
  <sem:triple>
    <sem:subject>item1</sem:subject>
    <sem:predicate>hasQty</sem:predicate>
    <sem:object>20</sem:object>
  </sem:triple>
</sem:triples>


<?xml  version="1.0" encoding="UTF-8"?>
<sem:triples xmlns:sem="http://marklogic.com/semantics">
  <sem:triple>
    <sem:subject>item2</sem:subject>
    <sem:predicate>hasQty</sem:predicate>
    <sem:object>5</sem:object>
  </sem:triple>
</sem:triples>`

This is the SPARQL query I am using to calculate sum of these quantities:

select (SUM(?p) as ?p) where { ?s <hasQty> ?p}

And the result I get is this -> "0"^^xs:integer instead of 25. Can you please suggest what is wrong in this.


Solution

  • Marklogic is a very powerful and versatile tool. Having said that, the way it handles RDF & SPARQL is at least a little non-standard in my opinion.

    In the future, you could probably read this: https://docs.marklogic.com/sem:rdf-serialize to learn how to convert MarkLogic's native representation of triples into standard RDF.

    Now I'm not an XML expert, but I don't think your triples block is valid XML. If it were, you could write an XSLT transformation to turn it into RDF XML.

    I did a little manual tidying to get well-formed XML, mainly for illustration purposes:

    <?xml version="1.0" encoding="UTF-8"?>
    <sem:triples xmlns:sem="http://marklogic.com/semantics">
      <sem:triple>
        <sem:subject>item1</sem:subject>
        <sem:predicate>hasQty</sem:predicate>
        <sem:object>20</sem:object>
      </sem:triple>
      <sem:triple>
        <sem:subject>item2</sem:subject>
        <sem:predicate>hasQty</sem:predicate>
        <sem:object>5</sem:object>
      </sem:triple>
    </sem:triples>
    

    As RDF/XML, that might look something like

    <?xml version="1.0" encoding="UTF-8"?>
    <rdf:RDF
            xmlns="http://wanna.be/"
            xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    
    <rdf:Description rdf:about="http://wanna.be/item1">
            <hasQty>20</hasQty>
    </rdf:Description>
    
    <rdf:Description rdf:about="http://wanna.be/item2">
            <hasQty>5</hasQty>
    </rdf:Description>
    
    </rdf:RDF>
    

    I created a default namespace of http://wanna.be/, and you can use the default abbreviation to say :hasQty instead of http://wanna.be/hasQty It's a little unusual to use a bare word like <hasQty> as a URI for a term in SPARQL query.

    Therefore, to get the sum of quantities, cast each quantity string to an int and then sum:

    PREFIX : <http://wanna.be/>
    PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
    select (sum(xsd:int(?o)) as ?oSum)  where {?s :hasQty ?o}