Search code examples
sparqlwikidata-query-service

How can get default return values for a SPARQL query when counting


I have a query where I count with a grouping on years. For some of the years there is no items to count and thus no result. I would like to have SPARQL return a count of zero for such years.

I am using the Wikidata Query Service, https://query.wikidata.org, and my present solution is to make a series of values and a union with the actual query. It looks a bit clumsy to me. Is there a better way?

#defaultView:BarChart
select ?year ?number_of_pages ?work_label where {
  {
    select ?year (sample(?pages) as ?number_of_pages) ?work_label       
    where {
      {
        select * where {
          values (?year ?pages ?work_label) {
            ("2000" "0"^^xsd:integer "_") 
            ("2001" "0"^^xsd:integer "_") 
            ("2002" "0"^^xsd:integer "_") 
            ("2003" "0"^^xsd:integer "_") 
            ("2004" "0"^^xsd:integer "_") 
            ("2005" "0"^^xsd:integer "_") 
            ("2006" "0"^^xsd:integer "_") 
            ("2007" "0"^^xsd:integer "_") 
            ("2008" "0"^^xsd:integer "_") 
            ("2009" "0"^^xsd:integer "_") 
            ("2010" "0"^^xsd:integer "_") 
            ("2011" "0"^^xsd:integer "_")
            ("2012" "0"^^xsd:integer "_")
            ("2013" "0"^^xsd:integer "_")
            ("2014" "0"^^xsd:integer "_")
            ("2015" "0"^^xsd:integer "_")
            ("2016" "0"^^xsd:integer "_")
          }
        }
      }
      union {
        ?work wdt:P50 wd:Q18921408 .
        ?work wdt:P1104 ?pages .
        ?work wdt:P577 ?date . 
        ?work rdfs:label ?long_work_label . filter(lang(?long_work_label) = 'en')
        bind(substr(?long_work_label, 1, 20) as ?work_label)
        bind(str(year(?date)) as ?year) 
      }
    } 
    group by ?year ?work ?work_label
    order by ?year 
  }
}

Solution

  • You don't need so many nested queries. Here is a simple solution:

    #defaultView:BarChart
    select ?year ?pages ?label       
    where {
      # iterating over years
      values ?year {
        "2000" "2001" "2002" "2003" "2004" "2005" 
        "2006" "2007" "2008" "2009" "2010" "2011" 
        "2012" "2013" "2014" "2015" "2016" 
      }
    
      # binding defaults
      bind( 0  as ?default_pages)
      bind("_" as ?default_label)
    
      # if there is a work in the given year, ?work_pages and ?work_label will be bound
      optional {
        ?work wdt:P50 wd:Q18921408;
              wdt:P1104 ?work_pages;
              wdt:P577  ?work_date. 
        bind(str(year(?work_date)) as ?year).
    
        ?work rdfs:label ?long_work_label. 
        filter(lang(?long_work_label) = 'en').
        bind(substr(?long_work_label, 1, 20) as ?work_label)
      }
    
      # either take ?work_pages/label value or default and bind it as the result ?pages/label
      bind(coalesce(?work_pages, ?default_pages) as ?pages)
      bind(coalesce(?work_label, ?default_label) as ?label)
    } 
    order by ?year
    

    Here is results screenshot:

    enter image description here


    The key here is the combination of optional + bind/coalesce. The general pattern is

    bind(... as ?default_foo)
    
    optional { 
      # try to get value ?foo
    }
    
    bind(coalesce(?foo, ?default_foo) as ?result_foo)
    

    coalesce returns the first value that it can (that is bound/evaluates without error). So if the value you tried to get in optional { ... } is not bound, the default one will be taken and bound as the result. A more verbose way to write it:

    bind(if(bound(?foo), ?foo, ?default_foo) as ?result_foo)
    

    But coalesce is better because you can pass several values in it. In a more complicated query it can be useful: see this example.