Search code examples
sparqlwikidata

Obtain the average number of distinct values for each of the properties


I am trying with SPARQL querying to do this:

For each of the properties, obtain the average number of distinct values that they take for the instances (e.g., what is the average number of occupations for a Formula 1 Driver, what is the average number of teams that they have participated in, etc.)


Solution

  • You would need a query like:

    SELECT ?p (AVG(?ct) AS ?avg)
    WHERE {
       SELECT ?s ?p (COUNT(DISTINCT ?o) AS ?ct) 
       WHERE {
          ?s ?p ?o .
          #more restrictions...
              }
       GROUP BY ?s ?p }
    GROUP BY ?p
    

    Notice that this approach ignores instances where the count is 0, i.e. a F1 driver with no profession.

    Also, the query is likely to time out unless you add some more restrictions to reduce the size of the matched data.