I am a beginner in using SPARQL for wikidata. I use it to get list of person`s data with particular date of death by code:
SELECT ?human ?humanLabel ?humanDescription ?gender ?birth_date ?death_date ?bplace ?dplace ?img ?prof ?profLabel ?article WHERE {
?human wdt:P31 wd:Q5;
wdt:P18 ?img;
wdt:P19 ?bplace;
wdt:P20 ?dplace;
wdt:P21 ?gender;
wdt:P569 ?birth_date;
wdt:P570 ?death_date;
#rdfs:label ?name;
#schema:description ?description;
wdt:P106 ?prof.
?article schema:about ?human .
?article schema:inLanguage "en".
FILTER (year(?death_date) = 2020)
#FILTER(!REGEX(STR(?article), "^<https://en.wikipedia.org/"))
SERVICE wikibase:label {bd:serviceParam wikibase:language "en".}
}
ORDER BY ASC(?death_date)
But the responce has a few duplicates (and make responce bigger).
Also I need to recieve only wikipedia data with articles from en.wikipedia.org but FILTER(!REGEX(STR(?article), "^<https://en.wikipedia.org/"))
makes query much more slow.
How could I solve it?
Your query has some problems:
?article schema:isPartOf <https://en.wikipedia.org/> .
instead of filtering, that's fasterThe resulting query runs in less than 30 seconds: https://w.wiki/UN8
Generally speaking, it's best to first write a simple, fast query that gives all results, then load other data and last load labels and descriptions.
As for the duplicates, your query will create every possible combination of the results of the rows. So if a person has two birthdates on wikidata, you'll get both, if there are multiple professions, you'll get every profession with the one and the other birthdate, because these are possible distinct results for your query. If you want a person just once, you'll have to aggregate the other rows, for example get there the minimum, group it etc.