Search code examples
pythonsparqlwikidata

Given a list of Wikidata identifiers, is there a way to find which ones are directly related using Python and/or SPARQL?


I have a list of Wikidata IDs and I want to find which of those are subclasses (P279) of others.

Let's suppose I have the list in pseudocode ["Q42" (Douglas Adams) , "Q752870" (motor vehicle) , "Q1420" (motor car), "Q216762" (hatchback car) ].

I'm trying to find a way to process this list and have as output something like:

[("Q752870", "Q1420")("Q1420","Q216762")] with the subclass pairs.

I could iterate the list and run a custom SPARQL queries for each pair, in pseudocode:

subclass_pairs = []
for a in list:
   for b in list:
     if custom_query_handler(a,b):
        subclass_pairs.append((a,b))

But this implies a very large number of SPARQL requests.

How to do this in a single SPARQL request? Is there any other solution possible?

`


Solution

  • While writing, I figured out the solution.

    A SPARQL query like (for direct links)

    SELECT * WHERE
    {
      
      VALUES ?a {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
      VALUES ?b {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
    
      ?a wdt:P279 ?b . 
      }
    

    or like (for direct and indirect links)

    SELECT  DISTINCT * WHERE
    {
      
      VALUES ?a {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
      VALUES ?b {wd:Q42 wd:Q752870 wd:Q1420 wd:Q216762} .
      FILTER (?a != ?b)
      ?a wdt:P279* ?b . 
      }
    

    returns a list of pairs exactly like I wanted. Then, it is just a matter of parsing in Python with something like SPARQLWrapper or wdcuration.

    A very large list will have to be split into chunks, as the SPARQL URLs might become too long.