Search code examples
weaviate

How to retrieve more than 10,000 objects from Weaviate?


I have Weaviate instance where I store 100k documents and I want to retrieve all of them from database in some way. Till now I was using REST GET /v1/objects (with modified python library), adding limit parameter, but that works only up to 10k.

I tried using offset to get over that limit, but as I found in the docs at https://weaviate.io/developers/weaviate/api/graphql/filters#performance-and-resource-considerations--limitations it doesn't help due to the offset-based implementation. It also states that I could change QUERY_MAXIMUM_RESULTS to a higher number that 10k, but it will hurt performance and I'm not sure how it will scale to really high numbers as it will fetch everything at once.

I want to be able to retrieve all documents from the database with arbitrarily high number of records, even 500k. I won't be doing this often, so it may work slower / in batch, but I want to have this option.

The only solution I came up with is to store all id's in different database and query Waeviate in multiple batches using filter with id and OR operator, but that seems too complicated.


Solution

  • The correct way to do this is by using the Cursor API, which has been available since version 1.18.