Search code examples
javasparqljenastardog

Stardog custom aggregate function unavailable in Jena


I've created a custom aggregate function in Stardog that calculates the standard deviation. This works great when you post SPARQL queries to the endpoint or via the query panel in the admin console.

So far, so good, but we're facing a couple of problems. First, of all, when we execute a query like the following, it will execute perfectly via Stardog, but will fail in the SPARQL validator (and with the Jena API as well):

PREFIX  :     <http://our/namespace#>
PREFIX  agg:  <urn:aggregate:>
SELECT (agg:stardog:stdev(?age) AS ?stdLMD) (AVG(?age) AS ?avg)
WHERE {
 ?pat a :Person .
 ?pat :age ?age . 
}

Stardog gives the correct results for standard deviation and average age, but the SPARQL validator throws an exception:

Non-group key variable in SELECT: ?age in expression (?age)

Does Stardog interpret the specification differently or is this a feature I'm unaware of?

Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev fuction. I guess because this extension is only Stardog related and unavailable for Jena? Let me show an example. ATM, we're executing CONSTRUCT queries via the following Jena code:

final Query dbQuery = QueryFactory.create(query.getContent());
final QueryExecution queryExec = QueryExecutionFactory.create(dbQuery, model);
queryExec.execConstruct(infModel);

As long as we're not using the aggregate function, this works like a charm. As we're constructing triples in multiple named graphs, it's very convenient to have a model available as well (which represents a named graph).

I would like to do something similar with the Stardog java API. I've only gotten as far as:

UpdateQuery dbQuery;
try {
    dbQuery = connection.update(query.getContent());
    dbQuery.execute();
} catch (final StardogException e) {
    LOGGER.error("Cannot execute CONSTRUCT query", e);
}

Problem is that you explicitly need to specify which named graph you want to manipulate in the CONSTRUCT query. There's nothing like a Jena model that represents a part of the database so that we can avoid specifying it in the query. What would be a good approach here?

So my question is twofold: why are queries parsed differently in Stardog and is it possible to have Jena detect the custom Stardog aggregate functions? Thanks!

UPDATE

In the end, what we're trying to accomplish, is to execute a construct query over a given named graph, but write the newly constructed triples to a different graph. In my Jena example, you can see that I'm working with two Jena models to accomplish that. How would you do this with the SNARL API? I've gotten as for as the following code snippet, but this only defines the dataset this query will be executed against, not where the triples will be written to. Any help on this is still appreciated!

UpdateQuery dbQuery;
try {
    dbQuery = connection.update(query.getContent());
    final DatasetImpl ds = new DatasetImpl();
    ds.addNamedGraph(new URIImpl(infDatasource));
    dbQuery.dataset(ds);
    dbQuery.execute();
} catch (final StardogException e) {
    LOGGER.error("Cannot execute CONSTRUCT query", e);
}

Solution

  • The likely reason for the error

    Non-group key variable in SELECT: ?age in expression (?age)

    Is that the SPARQL validator, and ARQ, have no idea that agg:stardog:stdev is an aggregate and does not interpret it that way. The syntax is no different than a standard projection expression such as (?x + ?y as ?sum), as AndyS noted.

    While the SPARQL spec doesn't quite preclude custom aggregates, they're not accounted for in the grammar itself. Both Stardog and Jena allow custom aggregates, albeit in different ways.

    Another problem, we're using a custom aggregate function (stdev) in a CONSTRUCT query and again that seems to be working fine via the Stardog API's. Most of our code though is based on Jena, and it doesn't seem to recognize the custom stdev fuction. I guess because this extension is only Stardog related and unavailable for Jena?

    Yes, Jena and Stardog are distinct. Anything custom you've defined in Stardog, such as a custom aggregate, won't available directly in Jena.

    You might be constructing the model in such a way that Jena, via ARQ, is the query engine as opposed to Stardog. That would explain why you get exceptions that Jena doesn't know about the custom aggregate you've defined within Stardog.

    There's nothing like a Jena model that represents a part of the database so that we can avoid specifying it in the query. What would be a good approach here?

    You can specify the active graph of a query programmatically via the SNARL API using dataset

    So my question is twofold: why are queries parsed differently in Stardog and is it possible to have Jena detect the custom Stardog aggregate functions? Thanks!

    They're parsed differently because there's no standard way of defining a custom aggregate and Stardog & Jena choose to implement it differently. Further, Jena would not be aware of Stardog's custom aggregates and vice versa.