Search code examples
indexingrdfsemantic-webtriplestore

Incremental indexing for semantic search


I wonder if there are some standards or best practices, in performing an incremental indexation of a triple store for semantic search purpose.

Indeed to support semantic search one usually use solr or elasticsearch where resource are indexed according to some specific SPARQL query. While one can re-index its entire resources set once a day for instance, it is not that desirable. Hence comes the need to perform it incrementally. However that requires somehow to track changes, with the ultimate goat to be able to keep on indexing or deleting whatever has changed only.

For instance to only index what has change, the SPARQL query should include some timestamp filter somehow.

If anyone has some suggestions, or experience on performing it, that he would like to share this would be well apreciated

So far I am being somewhat inspired by EEA ElasticSearch RDF River Plugin. I'm also looking at the ontology Changeset Ontology.


Solution

  • The easiest way to accomplish this would be to get something involved in the transaction lifecycle. Then you're able to see the changes to the database which will give you the graph that needs to be indexed.

    But don't dismiss doing a full re-index on a periodic schedule, such as nightly. Unless your requirement is that full-text searches must always be against the most recent data and your data changes quickly, a full re-index on a regular basis will work just fine.