Search code examples
nlpsparqlwikipediadbpediasparqlwrapper

SPARQL how to deal with different cased queries?


I am still a bit new to SPARQL. I have set up a dbpedia endpoint for our company. I have no idea what the end user will be querying and, since DBpedia is case sensitive I pass both title case & uppercase versions for subjects vs something like a person; e.g. "Computer_programming" vs "Alcia_Keys". Rather than pass in 2 separate queries what is the most effecient way to achieve this? I've tried the IN operator (from this question) but I seem to be failing somewhere.

select ?label ?abstract where {
   IN (<http://dbpedia.org/resource/alicia_keys>, <http://dbpedia.org/resource/Alicia_Keys>) rdfs:label ?label;
               dbpedia-owl:abstract ?abstract.
                }
                LIMIT 1"""

Solution

  • since DBpedia is case sensitive I pass both title case & uppercase versions for subjects vs something like a person; e.g. "Computer_programming" vs "Alcia_Keys". Rather than pass in 2 separate queries what is the most effecient way to achieve this?

    URIs should be viewed as opaque. While DBpedia generally has some nice structure so that you can lucky by concatenating http://dbpedia.org/resource and some string with _ replacing , that's really not a very robust way to do something. A better idea is to note that the string you're getting is probably the same as a label of some resource, modulo variations in case. Given that, the best idea would be to look for something with the same label, modulo case. E.g.,

    select ?resource where {
      values ?input { "AliCIA KeYS" }
    
      ?resource rdfs:label ?label .
      filter ( ucase(str(?label)) = ucase(?input) )
    }
    

    That's actually going to be pretty slow, though, because you'll have to find every resource, do some string processing on its label. It's an OK approach, in principle though.

    What can be done to make it better? Well, if you know what kind of thing you're looking for, that will help a lot. E.g., you could restrict the query to Persons:

    select distinct ?resource where {
      values ?input { "AliCIA KeYS" }
    
      ?resource rdf:type dbpedia-owl:Person ;
                rdfs:label ?label .
      filter ( ucase(str(?label)) = ucase(?input) )
    }
    

    That's an improvement, but it's still not all that fast. It still, at least conceptually, has to touch each Person and examine their name. Some SPARQL endpoints support text indexing, and that's probably what you need if you want to do this efficiently.

    The best option, of course, would be to simply ask your users for a little bit more information, and to normalize the data in advance. If your user provides "AliCIA KEyS", then you can do the normalization to "Alicia Keys"@en, and then do something ilke:

    select distinct ?resource where {
      values ?input { "Alicia Keys"@en }
      ?resource rdfs:label ?input .
    }