Search code examples
sparqldbpedia

Simplify SPARQL query


I’m trying to make a rather complex call to DBPedia using a SPARQL query. I’d like to get some information about a city (district, federal state/»Bundesland«, postal codes, coordinates and geographically related cities).

Try online!

SELECT * WHERE {
  #input
  ?x rdfs:label "Bentzin"@de.

  #district
  OPTIONAL {
    ?x dbpedia-owl:district ?district
    # ?x dbpprop:landkreis ?district
    { SELECT * WHERE {
       ?district rdfs:label ?districtName
       FILTER(lang(?districtName) = "de")

       ?district dbpprop:capital ?districtCapital
       { SELECT * WHERE {
         ?districtCapital rdfs:label ?districtCapitalName
         FILTER(lang(?districtCapitalName) = "de")
       }}
    }}
  }

  #federal state
  OPTIONAL {
    # ?x dbpprop:bundesland ?land
    ?x dbpedia-owl:federalState ?land
    { SELECT * WHERE {
        ?land rdfs:label ?landName
        FILTER(lang(?landName) = "de")
    }}
  }

  #postal codes
  ?x dbpedia-owl:postalCode ?zip.

  #coordinates
  ?x geo:lat ?lat.
  ?x geo:long ?long

  #cities in the south
  OPTIONAL {
    ?x dbpprop:south ?south
    {SELECT * WHERE {
      ?south rdfs:label ?southName
      FILTER(lang(?southName) = "de")
    }}
  }

  #cities in the north
  OPTIONAL {
    ?x dbpprop:north ?north
    { SELECT * WHERE {
       ?north rdfs:label ?northName
       FILTER(lang(?northName) = "de")
    }}
  }

  #cities in the west
  ...

}

This works in some cases, however, there are a few major problems.

  1. There are several different properties that may contain the value for the federal state or district. Sometimes it’s dbpprop:landkreis (the german word for district, in other cases it’s dbpedia-owl:district. Is it possible to combine those two in cases where only one of them is set?

  2. Further, I’d like to read out the names of the cities in the north, northwest, …. Sometimes, these cities are referenced in dbpprop:north etc. The basic query for each direction is the same:

    OPTIONAL {
      ?x dbpprop:north ?north
      { SELECT * WHERE {
        ?north rdfs:label ?northName
        FILTER(lang(?northName) = "de")
      }}
    }
    

    I really don’t want to repeat that eight times for every direction, is there any way to simplify this?

  3. Sometimes, there are multiple other cities referenced (example). In those cases, there are multiple datasets returned. Is there any possibility to get a list of the names of those cities in a single dataset instead?

    +---+---+---------------------------------------------------------------+
    | x | … |                            southName                          |
    +---+---+---------------------------------------------------------------+
    | … | … | "Darmstadt"@de, "Stuttgart"@de, "Karlsruhe"@de, "Mannheim"@de |
    +---+---+---------------------------------------------------------------+
    

Your feedback and your ideas are greatly appreciated!

Till


Solution

  • There are several different properties that may contain the value for the federal state or district. Sometimes it’s dbpprop:landkreis (the german word for district, in other cases it’s dbpedia-owl:district. Is it possible to combine those two in cases where only one of them is set?

    SPARQL property paths are great for this. You can just say

    ?subject dbprop:landkreis|dbpedia-owl:district ?district
    

    If there are more properties, you'll probably prefer a version with values:

    values ?districtProperty { dbprop:landkreis dbpedia-owl:district }
    ?subject ?districtProperty ?district
    

    Further, I’d like to read out the names of the cities in the north, northwest, …. Sometimes, these cities are referenced in dbpprop:north etc. The basic query for each direction is the same:

    OPTIONAL {
      ?x dbpprop:north ?north
      { SELECT * WHERE {
        ?north rdfs:label ?northName
        FILTER(lang(?northName) = "de")
      }}
    }
    

    Again, it's values to the rescue. Also, don't use lang(…) = … to filter languages, use langMatches:

    optional {
      values ?directionProp { dbpprop:north
                              #-- ...
                              dbpprop:south }
      ?subject ?directionProp ?direction 
      optional { 
        ?direction rdfs:label ?directionLabel
        filter langMatches(lang(?directionLabel),"de")
      }
    }
    

    Sometimes, there are multiple other cities referenced (example). In those cases, there are multiple datasets returned. Is there any possibility to get a list of the names of those cities in a single dataset instead?

    +---+---+---------------------------------------------------------------+
    | x | … |                            southName                          |
    +---+---+---------------------------------------------------------------+
    | … | … | "Darmstadt"@de, "Stuttgart"@de, "Karlsruhe"@de, "Mannheim"@de |
    +---+---+---------------------------------------------------------------+
    

    That's what group by and group_concat are for. See Aggregating results from SPARQL query. I don't actually see these results in the query you gave though, so I don't have good data to test a result with.

    You also seem to be doing a lot of unnecessary subselects. You can just put additional triples in the graph pattern; you don't need a nested query to get additional information.

    With those considerations, your query becomes:

    select * where {
      ?x rdfs:label "Bentzin"@de ;
         dbpedia-owl:postalCode ?zip ;
         geo:lat ?lat ;
         geo:long ?long
    
      #-- district
      optional {
        ?x dbpedia-owl:district|dbpprop:landkreis ?district .
        ?district rdfs:label ?districtName
        filter langMatches(lang(?districtName),"de")
        optional {
          ?district dbpprop:capital ?districtCapital .
          ?districtCapital rdfs:label ?districtCapitalName
          filter langMatches(lang(?districtCapitalName),"de")
        }
      }
    
      #federal state
      optional  {
        ?x dbpprop:bundesland|dbpedia-owl:federalState ?land .
        ?land rdfs:label ?landName
        filter langMatches(lang(?landName),"de")
      }
    
      values ?directionProp { dbpprop:south dbpprop:north }
      optional {
        ?x ?directionProp ?directionPlace .
        ?directionPlace rdfs:label ?directionName 
        filter langMatches(lang(?directionName),"de")
      }
    }
    

    SPARQL results

    Now, if you're just looking for the names of these things, without the associated URIs, you can actually use property paths to shorten a lot of the results that retrieve labels. E.g.:

    select * where {
      ?x rdfs:label "Bentzin"@de ;
         dbpedia-owl:postalCode ?zip ;
         geo:lat ?lat ;
         geo:long ?long
    
      #-- district
      optional {
        ?x (dbpedia-owl:district|dbpprop:landkreis)/rdfs:label ?districtName
        filter langMatches(lang(?districtName),"de")
        optional {
          ?district dbpprop:capital/rdfs:label ?districtCapitalName
          filter langMatches(lang(?districtCapitalName),"de")
        }
      }
    
      #-- federal state
      optional  {
        ?x (dbpprop:bundesland|dbpedia-owl:federalState)/rdfs:label ?landName
        filter langMatches(lang(?landName),"de")
      }
    
      optional {
        values ?directionProp { dbpprop:south dbpprop:north }
        ?x ?directionProp ?directionPlace .
        ?directionPlace rdfs:label ?directionName
        filter langMatches(lang(?directionName),"de")
      }
    }
    

    SPARQL results