Search code examples
sparql

how to group count items in SPARQL, accumulating low hit entries?


How do I count grouped entries in SPARQL, merging entries whose quantity is less than a specific factor?

Consider for example the Nobel Prize data. I could get a count of all family names with a query like

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name (count(*) as ?count) WHERE {
  ?id foaf:familyName ?name
}
GROUP BY $name
ORDER BY DESC($count)

How do I modify the query so it only returns the family names occuring at least 3 times, accumulating the other names as other.


Solution

  • Just wrap your SELECT into another one.

    Query

    PREFIX foaf: <http://xmlns.com/foaf/0.1/>
    
    SELECT ?name_ (SUM(?count) AS ?count_) {
      {
        SELECT ?name (COUNT(*) AS ?count) { 
          ?id foaf:familyName ?name
        } GROUP BY ?name
      }
      BIND (IF(?count > 2, ?name, "Other") AS ?name_)
    } GROUP BY ?name_ ORDER BY DESC(IF(?name_ = "Other", -1 , ?count_))
    

    Results

       name_       count_  
     ----------- --------- 
       Smith         5     
       Fischer       4     
       Wilson        4     
       Lee           3     
       Lewis         3     
       Müller        3     
       Other       878