Search code examples
marklogic

Using predicates on a large database?


I have a 50,000,000 document database that I'd like to write to a file the base-uri's for each document. Running the entire 50,000,000 is too long running (query times out). So, I thought I'd use predicates to break the database into more manageable batches. So, I tried the following to get a handle on its performance:

for $i in ( 49999000 to 50000000 )
return fn:base-uri( /mainDoc[position()=$i] )

But, performance was very slow for these 1000 base uris. In fact, the query timed out. I tried a similar query and got similar results (or lack of results):

for $i in ( /mainDoc ) [ 49999000 to 50000000 ]
return fn:base-uri( $i ) 

Is there a more performant method of looping through a large database, where documents at the end of the database are equally as quick to obtain as those at the beginning of the database?


Solution

  • If you just want the document URIs, that easy. Ensure you have the document lexicon enabled and run a cts:uris() call.

    To follow your approach to jump ahead in a document list to do something with each document, you can do the work unfiltered to make it fast:

    for $item in cts:search(/mainDoc, cts:and-query(()), "unfiltered")[49999000 to 5000000]
    return base-uri($item)
    

    The cts:and-query(()) is a shortcut way to pass an always-true query.