In my MarkLogic Database, we have documents which conform to the URI format in the following manners:
/documents/12345.xml
/documents/12-abc.xml
/documents/abc-123-def.xml
/12345.xml
I want to run a regex in the cts:uri-match to pick only those uri's which conform to the format
> /documents/{integer-values}.xml
Please suggest how to make this work. There are millions of documents in the database, I want to pick only the uris conforming to the above format, will be running a CORB process on those documents for the transformation. I don't want to get all the URI's and then run a fn:matches query to make this work.
Unfortunately, cts:uri-match
takes a wildcard pattern, not a regex. The closest you can get is with a pattern like "/documents/*.xml"
. It could trim down the number of results drastically already though, depending on your dataset. You can then filter out false positives with an additional predicate with fn:matches
. Something like:
cts:uri-match('/documents/*.xml')[fn:matches(., '^/documents/\d+\.xml$')]
So, perhaps a little less optimal than passing in a regex directly, but better than doing a regex on all uris. It should work just fine with millions of uris.
HTH!