Search code examples
sparqlwikidatawikidata-query-service

SPARQL query running too slow, when querying, sometimes timesout, is there a way to optimize the query?


My Wikidata Query Service query sometimes takes 35000 milliseconds or 35 seconds to complete. I'm not very adept at SPARQL. The query below does work (except it does duplicates sometimes). I'm wanting to get "famous people" by providing a birthday date and month, of which I get their name, birthday, an image of them (Wikimedia), and occupations. I'm also filtering by persons with birth places in the USA and UK.

I've added a variable called "sitelinks" that I count how many links are dedicated to them as a popularity metric (I'm open to better ideas on this, if there is a better way to gauge popularity). Is there a way to make this more optimized? Again, the query works, it's just slow.

PREFIX wd: <http://www.wikidata.org/entity/>
PREFIX wdt: <http://www.wikidata.org/prop/direct/>
PREFIX wikibase: <http://wikiba.se/ontology#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

SELECT ?person ?personLabel ?birthdate ?countryLabel (COUNT(DISTINCT(?sitelink)) as ?sites) (GROUP_CONCAT(DISTINCT ?occupationLabel; separator=", ") as ?occupations) (SAMPLE(?image) as ?uniqueImage)
WHERE {
  ?person wdt:P31 wd:Q5 ;  # Instance of human
      wdt:P569 ?birthdate ;  # Date of birth
      wdt:P27 ?country ;  # Citizenship
      wdt:P106 ?occupation ;  # Occupation
      wdt:P18 ?uniqueImage .  # Image
     
  ?country rdfs:label ?countryLabel .
  ?occupation rdfs:label ?occupationLabel .
  ?sitelink schema:about ?person .
  FILTER(LANG(?countryLabel) = "en")
  FILTER(LANG(?occupationLabel) = "en")

  FILTER(MONTH(?birthdate) = 5 && DAY(?birthdate) = 20)
  FILTER(?country IN (wd:Q30, wd:Q145))

  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
GROUP BY ?person ?personLabel ?birthdate ?countryLabel ?uniqueImage ORDER BY DESC(?sites)
LIMIT 50

If anyone wants to paste the query to the Wikidata Query Service, here is the link https://query.wikidata.org/


Solution

  • I think it's the site count that's killing your query and I think I resolved the duplicate record issue. Would something like this work for your use-case?

    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    SELECT 
    ?person 
    ?personLabel 
    ?birthdate 
    ?countryLabel 
    ?website
    (GROUP_CONCAT(DISTINCT ?occupationLabel; separator=", ") AS ?occupations) 
    (MIN(?image) AS ?uniqueImage)
    WHERE {
      ?person wdt:P31 wd:Q5 ;  # Instance of human
              wdt:P569 ?birthdate ;  # Date of birth
              wdt:P27 ?country .  # Citizenship
              
      OPTIONAL { ?person wdt:P856 ?website }  # Official website
      
      ?country rdfs:label ?countryLabel .
      FILTER(?country IN (wd:Q30, wd:Q145))
      FILTER(LANG(?countryLabel) = "en")
      
      FILTER(DATATYPE(?birthdate) = xsd:dateTime 
            && MONTH(?birthdate) = 5 
            && DAY(?birthdate) = 20)
       
      OPTIONAL { 
        ?person wdt:P106 ?occupation .  # Occupation
        ?occupation rdfs:label ?occupationLabel .
        FILTER(LANG(?occupationLabel) = "en")
      }
      
      OPTIONAL { ?person wdt:P18 ?image }  # Image
    
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    GROUP BY ?person ?personLabel ?birthdate ?countryLabel ?website
    ORDER BY DESC(?website)
    LIMIT 50
    

    Edit

    If you do need the site count though you might be able to use something like this

    PREFIX wd: <http://www.wikidata.org/entity/>
    PREFIX wdt: <http://www.wikidata.org/prop/direct/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    
    SELECT 
    ?person 
    ?personLabel 
    ?birthdate 
    ?countryLabel 
    (COUNT(DISTINCT ?sitelink) AS ?sites) 
    (GROUP_CONCAT(DISTINCT ?occupationLabel; separator=", ") AS ?occupations) 
    (MIN(?image) AS ?uniqueImage)
    WHERE {
      ?person wdt:P31 wd:Q5 ;  # Instance of human
              wdt:P569 ?birthdate ;  # Date of birth
              wdt:P27 ?country .  # Citizenship
              
      OPTIONAL { ?person wdt:P18 ?image }  # Image
    
      ?sitelink schema:about ?person .
      
      ?country rdfs:label ?countryLabel .
      FILTER(?country IN (wd:Q30, wd:Q145))
      FILTER(LANG(?countryLabel) = "en")
      
      FILTER(DATATYPE(?birthdate) = xsd:dateTime 
            && MONTH(?birthdate) = 5 
            && DAY(?birthdate) = 20)
       
      OPTIONAL { 
        ?person wdt:P106 ?occupation .  # Occupation
        ?occupation rdfs:label ?occupationLabel .
        FILTER(LANG(?occupationLabel) = "en")
      }
      
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    GROUP BY ?person ?personLabel ?birthdate ?countryLabel
    ORDER BY DESC(?sites)
    LIMIT 50