Search code examples
query-optimizationsemantic-websparqldbpedia

Optimization of SPARQL query. [ Estimated execution time exceeds the limit of 1500 (sec) ]


I am trying to run this query on http://dbpedia.org/sparql but I get an error that my query is too expensive. When I run the query trough http://dbpedia.org/snorql/ I get:

The estimated execution time 25012730 (sec) exceeds the limit of 1500 (sec) ...

When running the query through my python script using SPARQLWrapper I simply get an HTTP 500.

I figure I need to do something to optimize my SPARQL query. I need the data for iterating over educational institutions and importing it in to a local database, maybe I am using SPARQL wrong and should do this in a fundamentally different way.

Hope someone can help me!

The query

PREFIX owl: <http://www.w3.org/2002/07/owl#>
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX : <http://dbpedia.org/resource/>
PREFIX dbpedia2: <http://dbpedia.org/property/>
PREFIX dbpedia: <http://dbpedia.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

            SELECT DISTINCT ?uri
                ?name
                ?homepage
                ?student_count
                ?native_name
                ?city
                ?country
                ?type
                ?lat ?long
                ?image

            WHERE {
                ?uri rdf:type dbpedia-owl:EducationalInstitution .
                ?uri foaf:name ?name .
                OPTIONAL { ?uri foaf:homepage ?homepage } .
                OPTIONAL { ?uri dbpedia-owl:numberOfStudents ?student_count } .
                OPTIONAL { ?uri dbpprop:nativeName ?native_name } .
                OPTIONAL { ?uri dbpprop:city ?city } .
                OPTIONAL { ?uri dbpprop:country ?country } .
                OPTIONAL { ?uri dbpprop:type ?type } .
                OPTIONAL { ?uri geo:lat ?lat . ?uri geo:long ?long } .
                OPTIONAL { ?uri foaf:depiction ?image } .
            }
            ORDER BY ?uri
            LIMIT 20 OFFSET 10

Solution

  • Forget it. You won't be able to get that query back from dbpedia with just one SPARQL. Those optionals are very expensive.

    To work it around you need to first run something like:

     SELECT DISTINCT ?uri WHERE {
                    ?uri rdf:type dbpedia-owl:EducationalInstitution .
                    ?uri foaf:name ?name .
     } ORDER BY ?uri
     LIMIT 20 OFFSET 10
    

    Then iterate over the resultset of this query to form single queries for each dbpedia-owl:EducationalInstitution such as ... (notice the filter at the end of the query):

            SELECT DISTINCT ?uri
                ?name
                ?homepage
                ?student_count
                ?native_name
                ?city
                ?country
                ?type
                ?lat ?long
                ?image
    
            WHERE {
                ?uri rdf:type dbpedia-owl:EducationalInstitution .
                ?uri foaf:name ?name .
                OPTIONAL { ?uri foaf:homepage ?homepage } .
                OPTIONAL { ?uri dbpedia-owl:numberOfStudents ?student_count } .
                OPTIONAL { ?uri dbpprop:nativeName ?native_name } .
                OPTIONAL { ?uri dbpprop:city ?city } .
                OPTIONAL { ?uri dbpprop:country ?country } .
                OPTIONAL { ?uri dbpprop:type ?type } .
                OPTIONAL { ?uri geo:lat ?lat . ?uri geo:long ?long } .
                OPTIONAL { ?uri foaf:depiction ?image } .
            FILTER (?uri = <http://dbpedia.org/resource/%C3%89cole_%C3%A9l%C3%A9mentaire_Marie-Curie>)
            }
    

    Where <http://dbpedia.org/resource/%C3%89cole_%C3%A9l%C3%A9mentaire_Marie-Curie> has been obtained from the first query.

    ... and yes it will be slow and you might not be able to run this for an online application. Advice: try to work out some sort of caching mechanism to sit between your app and the dbpedia SPARQL endpoint.