Search code examples
schematron

Schematron strip trailing period and space at the end of the text


Does anyone know how to remove the trailing period and space when the text node has children?

i/p xml:

   <ul>
     <li>example1. </li>
     <li>example2.</li>
      <li>xyz size. <ph>567</ph> 1. <ph>9</ph>mm.</li>
      <li>abc size. <ph>1234</ph> 1. <ph>9</ph>mm. </li>
      <li>def size.<ph>123</ph> 3.<ph>5</ph>mm.</li>
   </ul>

The below code doesn't work properly when text has child elements.

Schematron:

       <sch:pattern>
        <sch:rule context="li//text()">
            <sch:report test="matches(., '(\w+)\.\s*$')" sqf:fix="listPeriod" role="warning">List
                should not end with a period</sch:report>
            <sqf:fix id="listPeriod" use-when="matches(., '(\w+)\.\s*$')">
                <sqf:description>
                    <sqf:title>Remove end period</sqf:title>
                </sqf:description>
                <sqf:stringReplace regex="(\w+)\.\s*$" select="'$1'"/>
            </sqf:fix>
        </sch:rule>
    </sch:pattern>

o/p:

   <ul>
      <li>example1</li>
      <li>example2</li>
      <li>xyz size<ph>567</ph> 1<ph>9</ph>mm</li>
      <li>abc size<ph>1234</ph> 1<ph>9</ph>mm</li>
      <li>def size<ph>123</ph> 3<ph>5</ph>mm</li>
   </ul>

desired o/p:

   <ul>
     <li>example1</li>
     <li>example2</li>
      <li>xyz size. <ph>567</ph> 1. <ph>9</ph>mm</li>
      <li>abc size. <ph>1234</ph> 1. <ph>9</ph>mm</li>
      <li>def size.<ph>123</ph> 3.<ph>5</ph>mm</li>
   </ul>

Thanks!!


Solution

  • Fixing in mixed-content is always hard, but in your case you can just fix the last text node in a li element.

    First of all, you should use the li as context to test the content at once and not each text node inside:

    <sch:rule context="li">
    

    You should add a match to the sqf:stringReplace fixing only the last text node inside:

    <sqf:stringReplace  match="(.//text())[last()]"/>
    

    That would be the whole pattern:

    <sch:pattern>
        <sch:rule context="li">
            <sch:report test="matches(., '(\w+)\.\s*$')" sqf:fix="listPeriod" role="warning">List
                should not end with a period</sch:report>
            <sqf:fix id="listPeriod">
                <sqf:description>
                    <sqf:title>Remove end period</sqf:title>
                </sqf:description>
                <sqf:stringReplace regex="(\w+)\.\s*$" match="(.//text())[last()]" select="'$1'"/>
            </sqf:fix>
        </sch:rule>
    </sch:pattern>
    

    Note: You can skip the use-when as the fix appears anyway only if the test failed.