Search code examples
pythonsparqlwikipediadbpedia

SPARQL: Exact rdfs:label query yields inconsistent results


I'm trying to get data such as population, lat, lon, etc from specific cities by doing an explicit search using SPARQL. The following code works for a city such as Barcelona, but yields no results for a city like Bilbao. So, the following:

SELECT Distinct ?city, ?country, ?lat, ?lon, ?population, ?area, ?elevation WHERE {
  ?city rdf:type <http://dbpedia.org/ontology/City> .
  ?city  <http://dbpedia.org/ontology/country> ?country .
  ?city  <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
  ?city  <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?lon .
  ?city  <http://dbpedia.org/ontology/populationTotal> ?population .
  ?city  <http://dbpedia.org/ontology/PopulatedPlace/areaTotal> ?area .
  ?city  <http://dbpedia.org/ontology/elevation> ?elevation;
rdfs:label "Barcelona"@en .
}

returns

:Barcelona  :Spain  41.3833 2.18333 1604555 "101.4"^^dbpedia:datatype/squareKilometre   12.0

but the same block with the line:

rdfs:label "Bilbao"@en .

comes back empty. Also failures for cities like Valencia, Bogota, Binasco... I would like to perform the search without filters, if at all possible. I've gotten mixed results with the following filtered query:

SELECT ?city, ?country, ?lat, ?lon, ?population, ?area, ?elevation, ?label  WHERE {
  ?city rdf:type <http://dbpedia.org/ontology/City> .
  ?city  <http://dbpedia.org/ontology/country> ?country .
  ?city  <http://www.w3.org/2003/01/geo/wgs84_pos#lat> ?lat .
  ?city  <http://www.w3.org/2003/01/geo/wgs84_pos#long> ?lon .
  ?city  <http://dbpedia.org/ontology/populationTotal> ?population .
  ?city  <http://dbpedia.org/ontology/PopulatedPlace/areaTotal> ?area .
  ?city  <http://dbpedia.org/ontology/elevation> ?elevation;
rdfs:label ?label .
FILTER contains ( ?label, "Bilbao")
FILTER langMatches(lang(?label),'en')
}
LIMIT 100

Any thoughts would be appreciated.


Solution

  • Dbpedia data is far away from being homogenous, thus, you have to ensure that your query really fits the data. For example, the DBpedia resource of Bilbao doesn't belong to the class dbo:City:

    SELECT * { dbr:Bilbao a ?cls }
    

    Among others, it belongs to the classes

    +------------------------------------------------------------+
    |                            cls                             |
    +------------------------------------------------------------+
    | http://www.wikidata.org/entity/Q486972                     |
    | http://www.w3.org/2003/01/geo/wgs84_pos#SpatialThing       |
    | http://www.w3.org/2002/07/owl#Thing                        |
    | http://umbel.org/umbel/rc/Village                          |
    | http://umbel.org/umbel/rc/PopulatedPlace                   |
    | http://umbel.org/umbel/rc/Location_Underspecified          |
    | http://schema.org/Place                                    |
    | http://dbpedia.org/ontology/Settlement                     |
    | http://dbpedia.org/ontology/PopulatedPlace                 |
    | http://dbpedia.org/ontology/Place                          |
    | http://dbpedia.org/ontology/Location                       |
    | http://dbpedia.org/class/yago/YagoPermanentlyLocatedEntity |
    | http://dbpedia.org/class/yago/YagoLegalActorGeo            |
    | http://dbpedia.org/class/yago/YagoGeoEntity                |
    | ...                                                        |
    +------------------------------------------------------------+
    

    The same holds for the properties, you have to ensure that all properties exist for the resources. If you cannot do that, wrap the triple patterns into OPTIONAL clauses. Note, this query might be more expensive due to the left-join execution.