I want to remove punctuation from a database of xml document in marklogic. This is made for preprocessing purposes for machine learning. I'm new to marklogic and i don't know how to do that. Is there an xquery query that could remove punctuation?
To do a mass replacement of all text in the database, and take out punctuation, you could start with something that looks like this code (modified for your needs):
for $doc in cts:search(fn:collection(), ())
for $text in $doc//text()
return xdmp:node-replace($text, text{fn:replace($text, "[\.,;]", "")})