Search code examples
sparqlwikidata

How to query for people using Wikidata and SPARQL?


I'm new to SPARQL and Wikidata for that matter. I'm trying to allow my users to search Wikidata for people, and people only, I don't want any results to be a motorcycle brand or anything.

So I was playing around over here with the following query:

SELECT ?person ?personLabel WHERE {
  ?person wdt:P31 wd:Q5.
  SERVICE wikibase:label {
    bd:serviceParam wikibase:language "en".
    ?person rdfs:label ?personLabel .
  }
  FILTER regex(?personLabel, "Albert", "i").
}
LIMIT 10

Though this eventually returns a result it is hardly as fast as I'd like it to be. Note that it also just times out if you try the above query with a name that's larger.

All the example queries work with, found here, assume that you already have an entity from which to query from. While in my case you have nothing to go on since I'm trying to query for someone with a certain name. I'm probably making some wrong assumptions about the inner workings of the database I'm working with but I'm not sure what they are though.

Any idea's?


Solution

  • The problem with doing a free text search with Wikidata is that it does not have a free text index (yet). Without an index text search requires trying a match for each label, which is not efficient. I could not come up with a query that searches for "Albert Einstein" and does not time out. An exact match (?person rdfs:label "Albert Einstein"@en .) does work, of course, but presumably that doesn't fit your needs. It would help if you could narrow down the selection of people in some other way first.

    DBpedia (http://dbpedia.org/sparql), on the other hand, has Virtuoso's bif:contains available, so this works quite fast there (http://yasgui.org/short/HJeZ4kjp):

    PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    SELECT * WHERE {
      ?sub a foaf:Person .
      ?sub rdfs:label ?lbl .
      ?lbl bif:contains "Albert AND Einstein" .
      filter(langMatches(lang(?lbl), "en"))
    } 
    LIMIT 10