Search code examples
javaxmlschematron

line breaks and spaces in XML Schematron


I have a problem. I have linebreaks and spaces and tabs in XML. Like this:

<value xs:type="DV_TEXT"><value>1111\this is what it is used for, this could be a   
really long line or even
multiple lines, just like
what you are reading now
</value></value>

The setTextContent and getTextContent in Java from org.w3c.dom deal just fine with it. No problem.

But now, I am generating Schematron for validation to check if this string really appears in the value. The Schematron is generated from a definition file in which the test-strings are configurated

The generated Schematron, the assert-test looks like this:

test="(matches(.,'1111\this is what it is used for, this could be a really long line or even&#xD;&#xA;multiple lines, just like&#xD;&#xA;what you are reading now'))"

And then when I validate, there are more problems coming up. First the linebreaks. It seems that in the definition-file from which the Schematron is generated there are \r\n instead of only \n. But well, I have to count on that. If I replace all &#xD;&#xA; with only &#xA; some of the errors are disappeared. And how can I be sure that the XML-file also has only &#xA; as linebreak?

I think I need to alter the string which comes in the test asserts, and, for example, replace all \r\n with only \n.

I have done that, and it solves partly my problem. What else should I think about?

All tips are welcome.


Solution

  • If you want the node text to be valid regardless of its whitespace use the normalize-space function function:

    The normalize-space function returns the argument string with whitespace normalized by stripping leading and trailing whitespace and replacing sequences of whitespace characters by a single space. [...]

    So, this should work:

    test="(matches(normalize-space(.),'1111\this is what it is used for, this could be a really long line or even multiple lines, just like what you are reading now'))