I have a 50,000,000 document database that I'd like to write to a file the base-uri's for each document. Running the entire 50,000,000 is too long running (query times out). So, I thought I'd use predicates to break the database into more manageable batches. So, I tried the following to get a handle on its performance:
for $i in ( 49999000 to 50000000 )
return fn:base-uri( /mainDoc[position()=$i] )
But, performance was very slow for these 1000 base uris. In fact, the query timed out. I tried a similar query and got similar results (or lack of results):
for $i in ( /mainDoc ) [ 49999000 to 50000000 ]
return fn:base-uri( $i )
Is there a more performant method of looping through a large database, where documents at the end of the database are equally as quick to obtain as those at the beginning of the database?
If you just want the document URIs, that easy. Ensure you have the document lexicon enabled and run a cts:uris()
call.
To follow your approach to jump ahead in a document list to do something with each document, you can do the work unfiltered to make it fast:
for $item in cts:search(/mainDoc, cts:and-query(()), "unfiltered")[49999000 to 5000000]
return base-uri($item)
The cts:and-query(())
is a shortcut way to pass an always-true query.