I was trying to extract all movies from Linkedmdb. I used OFFSET to make sure I wont hit the maximum number of results per query. I used the following scrip in python
"""
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX movie: <http://data.linkedmdb.org/resource/movie/>
SELECT distinct ?film
WHERE {
?film a movie:film .
} LIMIT 1000 OFFSET %s """ %i
I looped 5 times, with offsets being 0,1000,2000,3000,4000 and recorded the number of results. It was (1000,1000,500,0,0). I already knew the limit was 2500 but I thought by using OFFSET, we can get away with this. Is it no true? There is no way to get all the data (even when we use a loop of some sort)?
Your current query is legal, but but there's no specified ordering, so the offset doesn't bring you to a predictable place in the results. (A lazy implementation could just return the same results over and over again.) When you use limit and offset, you need to also use order by. The SPARQL 1.1 specification says (emphasis added):
15.4 OFFSET
OFFSET causes the solutions generated to start after the specified number of solutions. An OFFSET of zero has no effect.
Using LIMIT and OFFSET to select different subsets of the query solutions will not be useful unless the order is made predictable by using ORDER BY.