Search code examples
sparqlwikidata

How can I avoid timeout on a SPARQL query on Wikidata?


I am trying to extract all items of a category on Wikidata, with their respective page title in English. It works ok as long as the category does not contain many items, like this:

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31/wdt:P279* wd:Q734454.
  ?work rdfs:label ?workLabel .
  FILTER ( LANGMATCHES ( LANG ( ?workLabel ), "en" ) ) 
}
ORDER BY ?work

but times out (Query timeout limit reached )as soon as I use a category with more items, such as Q2188189. See This example

I have tried using LIMIT or OFFSET clauses but this does not change the result.

I also have tried to insert a filter like this FILTER (regex(?work, '.*Q1.*')) . to slice the query in subsets, also without success (No matching records found).

For now I have only extracted the ids - and then run queries to get the page title for each one of them, but that seems silly.

Is there a way to work around the timeout?


Solution

  • Standard method

    If you want the page title of all the music works which have an article on en.wikipedia.org, you must use the following query:

    SELECT ?work ?workTitle
    WHERE
    {
      ?work wdt:P31/wdt:P279* wd:Q2188189.
      ?workLink schema:about ?work ;
        schema:isPartOf <https://en.wikipedia.org/> ;
        schema:name ?workTitle .
    }
    

    I tried it three times and two of them it haven't exceed timeout.

    Alternative method

    If you don't manage to make it work, the only workaround I can imagine is to retrieve all the possible types (i.e. subclasses) of music work, and adapt the above query to the single-class case.

    So, the first step is:

    SELECT ?workType WHERE { ?workType wdt:P279* wd:Q2188189. }
    

    You'll get more than a thousand results. For each of them (take for example the result Q2743), you'll then have to run the following query:

    SELECT ?work ?workTitle
    WHERE
    {
      ?work wdt:P31 wd:Q2743.
      ?workLink schema:about ?work ;
        schema:isPartOf <https://en.wikipedia.org/> ;
        schema:name ?workTitle .
    }
    

    This will return all the items that are directly instances of Q2743, without caring about subclasses.

    This method is a bit cumbersome, abut you can use it if you don't care of doing many queries. The idea is to divide the complexity among many queries, so that you will exceed the timeout less likely for each of them.