Search code examples
nestedsparqlwikidata

Counting and rank subprofessions by number of people


I'm currently trying to write a SPARQL query for Wikidata in which I rank subprofessions according to how many people have that respective occupation and group it according to their parent profession alphabetically. My final result should look something like

Profession | Subprofession | Count
Artist     | Painter       | 34
Artist     | Actor         | 12
Politician | President     | 67
Politician | Minister      | 13

Right now, I could only go as far as displaying the parent profession, but I feel I have a long way to go ahead and introducing the subprofession in the query and just trying to display it along side the parent occupation leads all the time to Timeout. Is it here where I should use nested SELECTS? I'm not very familiar with them

SELECT ?occupation ?suboccupation (count(*) as ?count)
WHERE
{
    ?people wdt:P106 ?occupation . #occupation
    ?suboccupation wdt:P279 ?occupation . #subclassof
}
GROUP BY ?occupation ?suboccupation
ORDER BY DESC(?count)

Thank you everybody in advance!


Solution

  • Well, there seem to be some professions and sub-professions that have no English language label so some of the listings are not very helpful. In addition, this list is LONG! You may want to aggregate more or limit the results somehow.

    Here's a start to what you might want:

    SELECT ?profLabel ?subprofLabel ?count
    WITH {
      SELECT ?prof ?subprof (COUNT(?person) AS ?count) WHERE {
        ?prof wdt:P31 wd:Q28640.
        ?subprof wdt:P279+ ?prof.
        ?person wdt:P106 ?subprof.
      }
      GROUP BY ?prof ?subprof
    } AS %main {
      INCLUDE %main .
      SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
    }
    ORDER BY ?profLabel DESC(?count)