Search code examples
rdfsparqllinkedmdb

Counting number of individuals in SPARQL


I am totally new to SPARQL.

I would like to count the number of actors in this ontology : http://data.linkedmdb.org/directory/actor

I tried the following:

SELECT ?s (COUNT(*) AS ?count)
WHERE
   {
       ?a <http://data.linkedmdb.org/directory/actor> ?s}
 GROUP BY ?s

But i believe that's not the right syntax for it because it gives me 0 results.. Where I know there are several results in that data source! Could it be that the link is not the correct one?


Solution

  • Counting actors per film

    In the original formulation of the question, it appears that you're trying to count actors per film. The query is actually very close to that, but I'm not sure where you got the property URI from, but it's not right. E.g., if you look at http://data.linkedmdb.org/page/film/1 and right click on the movie:actor property, you can see that its URI is http://data.linkedmdb.org/resource/movie/actor. Thus, your query could be:

    SELECT ?film (count(*) as ?nActors) WHERE {
      ?film <http://data.linkedmdb.org/resource/movie/actor> ?actor .
    }
    group by ?film
    limit 10
    

    Counting actors

    Now, you could modify this query to count actors, by running the same triple pattern, but instead of grouping by the film and counting actors, just count distinct actors over all the films:

    select(count(distinct ?actor) as ?nActors) where {
      [] <http://data.linkedmdb.org/resource/movie/actor> ?actor .
    }
    

    Now, that seems to provide an answer of 162, which seems rather low, but in other questions we've seen that LinkedMDB's endpoint has some strange limitations. It might be because of those, but this isn't the only way that we can count actors, either. You can note by looking at an actor's page, e.g., http://data.linkedmdb.org/page/actor/10, that each actor has the rdf:type http://data.linkedmdb.org/resource/movie/actor, which means that you can just ask for and count things with that type:

    select(count(distinct ?actor) as ?nActors) where {
      ?actor a <http://data.linkedmdb.org/resource/movie/actor> .
    }
    

    That query returns 2500, which seems more appropriate (since it's probably hitting a limit of 2500 in the endpoint).