Search code examples
sparqlgraphdb

GraphDB queries fail silently (OutOfMemoryError)


I'm dealing with a pretty huge repository (i.e., ~16M statements). I'm trying to get a list of all distinct class membership patterns in my graph. Here is the query:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

select distinct (group_concat(?t) as ?fo)
where { 
    ?s rdf:type ?t.    
} 

group by (?s)
order by ?fo

Of course such a query does have to return results. Strangely enough, sometimes I get the result that I need, sometimes the query returns no results.

Why is that happening and how can I avoid that?

PS: Monitoring the query, I noticed that when I have no data, the query status stays stuck on:

IN_HAS_NEXT

0 operations

until the conclusion.

Update:

Here is the main log describing the problem. The output in the workbench of GraphDB is:

No results. Query took 1m 53s, minutes ago.

No mention of errors. Quite strange. As Gilles-Antoine Nys pointed out, it is a problem with memory and Java's GC. Overall, I think that the workbench should explicitly show an error message in such cases.


Solution

  • As the other comments already have suggested the error is caused by OME. In the next upcoming GraphDB 8.6 release, the developers made a much more memory efficient implementation of all aggregates and distinct. Until its public release, there are only few options to test:

    1. Decrease the amount of consumed memory by writing a slightly more optimal version of your query, which displays only the local names:
    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    select ?s_local (group_concat(?t_local) as ?fo)
    where {
        ?s rdf:type ?t.
        BIND (REPLACE(STR(?t), "^(.*)(/|#)([^#/]*)$", "$3") as ?t_local)
        BIND (REPLACE(STR(?s), "^(.*)(/|#)([^#/]*)$", "$3") as ?s_local)
    } group by (?s_local)
    order by (?fo)
    
    1. Increase the available RAM for performing all aggregate calculations. You can increase it by either passing a higher value of the -Xmx parameter or setting a minimal number for graphdb.page.cache.size in graphdb.properties like: graphdb.page.cache.size=100M. The cache controls the number of stored in memory pages, which for your query and repository size won't make much of a difference.

    2. Limit the the maximum length of the group concat string with -Dgraphdb.engine.function.concat.max-length=4096. The limit won't make the query to execute successfully, but it will indicate, whether the problem is too many subjects too long strings.