Search code examples
algorithmsparqlowlontologyfuzzy-search

Search algorithm options for ontology querying?


I have developed a tool that enables searching of an ontology I authored. It submits the searches as SPARQL queries.

I have received some feedback that my search implementation is all-or-none, or "binary". In other words, if a user's input doesn't exactly match a term in the ontology, they won't get any hit at all.

I have been asked to add some more flexible, or "advanced" search algorithms. Indexing and bag-of-words searching were suggested.

Can anyone give some examples of implementing search methods on an ontology that don't require a literal match?


Solution

  • FIrst of all, what kind of entities are you trying to match (literals, or string casts of URIs?), and what kind of SPARQL queries are you running now? Something like this?

    ?term ?predicate "user input" .
    

    If you are searching across literals, you can make the search more flexible right off the bat by using case-insensitive regular expression filtering, although this will probably make your searches slower, and it won't catch cases where some of the word tokens are present but in a different order. In the following example, your should probably constrain the types of ?term and ?predicate first, or even filter on a string datatype on ?userInput

    ?term ?predicate ?someLiteral .
    FILTER(regex(?someLiteral), "user input", "i"))
    

    Several triplestores offer support for full-text searching and result scoring. These are often extensions to the SPARQL language.

    For example, Virtuoso and some others offer a bif:contains predicate. Virtuoso also offers the faceted search web interface (plus a service, I think.) I have been pleased with the web-based full text search in Blazegraph and Stardog, but I can't say anything at this point about using them with a SPARQL query to get a score on a search pattern. Some (GraphDB) even support explicit integration with Lucene or Solr*, so you may be able to take advantage of their search languages.

    Finally... are you using a library like the OWL API or RDF4J to access your ontology? If so, you could certainly save the relationships between your terms and any literals in a Java native data structure, and then directly use a fuzzy search component like Lucene to index each literal as a "document" and then search the user input across the index.

    Why don't you post your ontology and give an example of a search you would like to peform in a non-binary way. I (or someone else) can try to show you a minimal implementation.

    *Solr integration only appears to be offered in the commercially-licensed version of GraphDB