Search code examples
xquerymarklogic

How to remove punctuation from a database in marklogic?


I want to remove punctuation from a database of xml document in marklogic. This is made for preprocessing purposes for machine learning. I'm new to marklogic and i don't know how to do that. Is there an xquery query that could remove punctuation?


Solution

  • To do a mass replacement of all text in the database, and take out punctuation, you could start with something that looks like this code (modified for your needs):

    for $doc in cts:search(fn:collection(), ())
        for $text in $doc//text()
            return xdmp:node-replace($text, text{fn:replace($text, "[\.,;]", "")})