I have a list of Wikidata IDs and I want to find which of those are subclasses (P279) of others.
Let's suppose I have the list in pseudocode ["Q42" (Douglas Adams) , "Q752870" (motor vehicle) , "Q1420" (motor car), "Q216762" (hatchback car)
].
I'm trying to find a way to process this list and have as output something like:
[("Q752870", "Q1420")("Q1420","Q216762")]
with the subclass pairs.
I could iterate the list and run a custom SPARQL queries for each pair, in pseudocode:
subclass_pairs = []
for a in list:
for b in list:
if custom_query_handler(a,b):
subclass_pairs.append((a,b))
But this implies a very large number of SPARQL requests.
How to do this in a single SPARQL request? Is there any other solution possible?
`
While writing, I figured out the solution.
A SPARQL query like (for direct links)
SELECT * WHERE
{
VALUES ?a {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
VALUES ?b {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
?a wdt:P279 ?b .
}
or like (for direct and indirect links)
SELECT DISTINCT * WHERE
{
VALUES ?a {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
VALUES ?b {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
FILTER (?a != ?b)
?a wdt:P279* ?b .
}
returns a list of pairs exactly like I wanted. Then, it is just a matter of parsing in Python with something like SPARQLWrapper or wdcuration.
A very large list will have to be split into chunks, as the SPARQL URLs might become too long.