Search code examples
databasesparqlwikipediawikidata

SPARQL WikiData. How to chose only wikipedia data base and avoid notes duplication?


I am a beginner in using SPARQL for wikidata. I use it to get list of person`s data with particular date of death by code:

SELECT ?human ?humanLabel ?humanDescription ?gender ?birth_date ?death_date ?bplace ?dplace ?img ?prof ?profLabel ?article WHERE {
  ?human wdt:P31 wd:Q5;
    wdt:P18 ?img;
    wdt:P19 ?bplace;
    wdt:P20 ?dplace;
    wdt:P21 ?gender;
    wdt:P569 ?birth_date;
    wdt:P570 ?death_date;
    #rdfs:label ?name;
    #schema:description  ?description;
    wdt:P106 ?prof.
  ?article schema:about ?human .  
  ?article schema:inLanguage "en".
  FILTER (year(?death_date) = 2020)
  #FILTER(!REGEX(STR(?article), "^<https://en.wikipedia.org/"))
  SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
ORDER BY ASC(?death_date)

Try it!

But the responce has a few duplicates (and make responce bigger). Also I need to recieve only wikipedia data with articles from en.wikipedia.org but FILTER(!REGEX(STR(?article), "^<https://en.wikipedia.org/")) makes query much more slow.

How could I solve it?


Solution

  • Your query has some problems:

    The resulting query runs in less than 30 seconds: https://w.wiki/UN8

    Generally speaking, it's best to first write a simple, fast query that gives all results, then load other data and last load labels and descriptions.

    As for the duplicates, your query will create every possible combination of the results of the rows. So if a person has two birthdates on wikidata, you'll get both, if there are multiple professions, you'll get every profession with the one and the other birthdate, because these are possible distinct results for your query. If you want a person just once, you'll have to aggregate the other rows, for example get there the minimum, group it etc.