Search code examples
rdfsparqllinked-datalinkedmdb

LinkedMDB SPARQL results with fewer results than expected?


Slide 14 of Ontology Alignment Discovery using Linked Open Data says that there are 50,603 actors in the LinkedMDB dataset. Using the following query, I get 2500. Who is wrong here?, Is there something missing in the query? Why do the slides count such a high number? This is the SPARQL query:

select(count(distinct ?actors) as ?nActors) where {
  ?actors a <http://data.linkedmdb.org/resource/movie/actor> .
}

Results


Solution

  • Many public endpoints impose limits on queries in order to ensure that one badly behaved client does not bring down/adversely affect performance for other users of the service.

    The specific service you are talking about appears to have a result limit of 2500 as answers like this discuss

    Some services may also have execution time limits that prevent queries running beyond a certain amount of time.

    You can normally work around this limitation by using the LIMIT and OFFSET to request pages of results. Unfortunately it appears that this won't help your query because you use an aggregate and it appears the service is applying the limit prior to the aggregation. Note that if you have other queries where the LIMIT and OFFSET approach would work i.e. those that don't use aggregation you may also need to add an ORDER BY as depending on the SPARQL service without it you may just receive the same results repeatedly