Search code examples
openrefinegraphdb

Setting up an OpenRefine reconciliation endpoint over GraphDB


Recent versions of GraphDB offer an integrated OpenRefine tool, with its all important support for reconciling data against existing vocabularies, provided they are exposed via an OpenRefine-compliant reconciliation API, which you can then call from GraphDB/OpenRefine. Now following a few hints I picked up from recent GraphDB talks, I expected such a reconciliation API would be also automatically exposed over the data in GraphDB itself (possibly involving the Lucene connector), so that you could reconcile new tabular data against the entities that are already in your RDF graph. But unfortunately I can't find any information about such support in the docs nor in the most recent GraphDB release. Is there any straightforward way of setting up such service over RDF data / SPARQL endpoint? Thanks in advance for any tips.


Solution

  • OntoRefine does not have built in reconciliation servers yet. However, we are working on such as part of this project https://www.ontotext.com/knowledgehub/current/cima-project/. We already have a VIAF recon server that we are considering making available as a free service. And a more generic way to setup recon over RDF data that uses Elastic for scoring.

    (The grefine rdf extension is not good for this purpose: it has no scoring, and you can't even tell it which lucene index to use)

    UPDATE Sep 2020:

    • We've developed a VIAF recon server that's much better than previously existing ones. It takes into account name variants, parses out nationality and occupation, and sorts candidates by some "importance" metrics. We have not yet deployed this as we're looking for a client.
    • You can implement recon over RDF data, using the same framework that the above VIAF server uses (which is based on mapping RDF props to Lucene/Elastic and using its "similar" functionality). Again, we're looking for a client or pretext to release this framework as part of GraphDB.