If I take a simple query, such as match $x isa dog; limit 5; get;
then no matter how many dogs are stored in Grakn, I get 5 results back. This is fine, but what if I don't know how many dogs I want when I make the query and want to limit the number I retrieve later on in my code?
Here's my idea using the Python client:
import grakn
client = grakn.Grakn(uri="localhost:48555")
session = client.session(keyspace="dogs_keyspace")
tx = session.transaction(grakn.TxType.WRITE)
results = tx.query('match $x isa dog; get;') # I don't limit now, so I can do it later
results
is an iterator, so I can't do this:
limited_results = list(results)[:5]
because if I do then all dogs will be put into the list, and then I'll take the first 5, which is really inefficient if I have 1,000,000 dogs in the knowledge graph.
But I can say:
limited_results = list(itertools.islice(results, 5))
and I should get just the first 5 dogs without touching the other 999,995 dogs.
But my question is: is there any reason this approach would be slower than providing the limit 5
in the query like match $x isa dog; limit 5; get;
?
If you don't want Grakn to retrieve all the dog
s that are in the graph and only access the first 5, both your approaches are valid as they both use lazy iterators, meaning that no retrieval is attempted until you explicitly ask for the next result.
If you issue directly the query match $x isa dog; limit 5; get;
, Grakn will build an iterator and it will iterate over it just 5 times and return the result to the client.