Search code examples
marklogicmarklogic-9

cts:uri-match to pick a particular format


In my MarkLogic Database, we have documents which conform to the URI format in the following manners:

/documents/12345.xml
/documents/12-abc.xml
/documents/abc-123-def.xml
/12345.xml

I want to run a regex in the cts:uri-match to pick only those uri's which conform to the format

> /documents/{integer-values}.xml

Please suggest how to make this work. There are millions of documents in the database, I want to pick only the uris conforming to the above format, will be running a CORB process on those documents for the transformation. I don't want to get all the URI's and then run a fn:matches query to make this work.


Solution

  • Unfortunately, cts:uri-match takes a wildcard pattern, not a regex. The closest you can get is with a pattern like "/documents/*.xml". It could trim down the number of results drastically already though, depending on your dataset. You can then filter out false positives with an additional predicate with fn:matches. Something like:

    cts:uri-match('/documents/*.xml')[fn:matches(., '^/documents/\d+\.xml$')]
    

    So, perhaps a little less optimal than passing in a regex directly, but better than doing a regex on all uris. It should work just fine with millions of uris.

    HTH!