Search code examples
sparqlsemantic-webdbpediardfssparqlwrapper

How to search for rdfs:labels in dbpedia which are partial matches to a given term using SPARQL?


I am using this query to search for all labels that contains the word "Medi"

select distinct ?label where 
{ 
    ?concept rdfs:label  ?label 
    filter contains(?label,"Medi") 
    filter(langMatches(lang(?label),"en")) 
}

However, as soon as I change the term from "Medi" to "Medicare" it doesn't work and times out. How do I get it to work with longer words like Medicare i.e. extract all labels which has the word Medicare in it.


Solution

  • Your query has to iterate over all labels in DBpedia - which is quite a large number - and then apply String containment check. This is indeed expensive.

    Even a count query leads to an "estimated timeout error":

    select count(?label) where 
    { 
        ?concept rdfs:label  ?label 
        filter(regex(str(?label),"Medi")) 
        filter(langMatches(lang(?label),"en")) 
    }
    

    Two options:

    1. Virtuoso has some fulltext search support:

      SELECT DISTINCT ?label WHERE { 
        ?concept rdfs:label ?label .
        ?label bif:contains "Medicare"
        FILTER(langMatches(lang(?label),"en"))
      }
      
    2. Since the public DBpedia endpoint is a shared endpoint, the solution is to load the DBpedia dataset into your own triple store, e.g. Virtuoso. There you can adjust the max. estimated execution timeout parameter.