Search code examples
sparqldbpedia

SPARQL: get biggest city with female mayor


I try to get all cities with a female mayor, but my problem is, that the most mayors if they are in marked in data they are mostly not linked to a person, they are only given as string. So how can I get the gender.

My command to get all mayors is:

SELECT * WHERE {
 ?city a dbo:City .
 ?city dbo:populationTotal ?pop .
 ?city (dbp:mayor | dbo:mayor | dbp:leader |dbo:leader) ?mayor

}
ORDER BY DESC(?pop)

I am not sure if I get all mayors, because there are only 500 mayors versus 19584 cities with given population that sounds less.

For the reason I did not get gender on this way I did a second request to get all female people in DBpedia and try to compare both results, but mostly ends in timeout or "no result"

On example command runs in timeout was:

SELECT ?name ?sayor WHERE {
 ?person a dbo:Person .
 ?person foaf:gender ?gender .
 FILTER regex(?gender, "^female$", "i") .
 ?person rdfs:label ?name .
 {
  SELECT str(?mayor) AS ?sayor WHERE {
   ?city a dbo:City .
   ?city (dbp:mayor | dbp:leader) ?mayor .
   FILTER (str(?mayor) = str(?name))
  }
 }
}

Has anyone an idea to get all cities with female mayor? I am also happy with getting some stimulation's.


Solution

  • this is a particular problem which arises due to multiple factors. Here the main two:

    The awkward property structure of dbo:Settlement: The Infobox of the Wikipedia Template Infobox settlement does not have a direct leader/mayor property. Since these templates were not created with a clear object structure in mind, the leader properties are flattened in this template into:

    | leader_title = [[Mayor of Chicago|Mayor]]
    | leader_name = [[Rahm Emanuel]]
    | leader_party = [[Democratic Party (United States)|D]]
    | leader_title1 = [[City council|Council]]
    | leader_name1 = [[Chicago City Council]]
    

    Which leaves editors with no real option but to 'misuse' the leader_name property, not to put down the name literal but to point to the resource (person) in question. Different Infobox templates have a mayor/leader property but this is the one most often used to describe cities in the English Wikipedia.

    Second, the quite rigid behavior of the DBpedia mappings, which have problems dealing with Infobox properties which could be either a literal or a resource link. At least it is difficult for mapping editors to tackle this. DBpedia is dealing with this (and other issues) right now by introducing RML mappings.

    This, of course, can appear in similar fashion in other dbo classes as well. While DBpedia is looking into such issues, up to date mappings from Wikipedia templates to DBpedia ontology is one way to solve this, which is something anyone can contribute.

    With this in mind we can solve you origin problem:

    SELECT * WHERE {
       ?plebs rdfs:subClassOf dbo:Settlement.
       ?city a ?plebs .
       ?city (dbp:mayor | dbo:mayor | dbp:leader |dbo:leader | dbo:leaderName) ?mayor .
       ?mayor foaf:gender ?gender.
       ?city dbo:populationTotal ?pop .
       FILTER(str(?gender) = "female")
      }
    ORDER BY DESC(?pop)
    

    Which results in this top three:

    http://dbpedia.org/resource/Tokyo   http://dbpedia.org/resource/Yuriko_Koike    
    "female"@en    13617445
    http://dbpedia.org/resource/Mumbai  http://dbpedia.org/resource/Snehal_Ambekar  
    "female"@en    12442373
    http://dbpedia.org/resource/Yuncheng    http://dbpedia.org/resource/Wang_Yuyan  
    "female"@en    5134779
    

    Which seems about right to me.