Search code examples
rdfgraph-databasesgraphdb

Why is GraphDB running out of memory during inferencing?


I have a knowledge graph with roughly 10 billion nodes and am using GraphDB as the triplestore. When uploading the ontology to perform reasoning (using the rdfs plus ruleset), I get the following error message below. Reasoning can also take up to a week.

![Out of memory error1]

The error is clearly about not enough memory available, but I'd like to know

  1. Why is the needed memory less than the available?
  2. What is the map index rehash?
  3. Are there any tips for improving load times, aside from the docs here?

Solution

    • What reasoning do you use and why (i.e. do you have a valid need for each reasoning rule)?
    • See https://graphdb.ontotext.com/documentation/10.3/rules-optimisations.html, where the best advice is "don't use reasoning you don't need"
    • What expansion ratio do you expect? Have you tried reasoning on a representative subset of your data?
    • It's best to first load your ontologies, then your data. Otherwise you're precipitating a "big bang": adding a few hundred ontology (T-box) statements may cause GDB to infer several billion statements. It's better to split (distribute) the reasoning into chunks, to parallel the loading of your data in chunks.
    • You'll also want to insert your ontologies as part of a schemaTransaction https://graphdb.ontotext.com/documentation/10.3/delete-optimisations.html#schema-transactions else deleting data will be very slow