Search code examples
xmlschematronoxygenxml

Creating a schematron to flag latinisms (etc, i.e, e.g) but its also flagging words with those letters in them


I have a schematron created to flag latinisms in a topic. It works a little too well. It's also flagging words that have that combination of letters in them. For example, it needs to flag "etc" but it is also flagging "ketchup" because ketchup has "etc" in the middle. I don't know what to change in my code to make it so it only flags the actual latinism and not other words.

Here is my code so far:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
queryBinding="xslt2">
   <sch:let name="words" value="' i.e, etc., e.g., vs, et al, circa'"/>
    <sch:let name="wordsToMatch" value="replace($words, ',', '|')"/>
    <sch:pattern id = "LatinismsCheck">
    <sch:rule context="text()">
        <sch:report test="matches(., $wordsToMatch)" role="warn">
            The following words should not be added in the topic:
            <sch:value-of select="$words"/>
           </sch:report>
        </sch:rule>
    </sch:pattern>
</sch:schema>

Solution

  • Maybe you can mark in the regular expression the word boundary with '\b'. Something like this:

    <sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron"
    queryBinding="xslt2" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <sch:let name="words" value="'i.e.,etc.,e.g.'"/>
    <sch:let name="wordsToMatch">
        <xsl:for-each select="tokenize($words,',')">
            <xsl:value-of select="concat('(\b', normalize-space(.), ')')"/>
            <xsl:if test="position() != last()">
                <xsl:value-of select="'|'"/>
            </xsl:if>
        </xsl:for-each>
    </sch:let>
    
    <sch:pattern>
        <sch:rule context="text()">
            <sch:report test="matches(., string($wordsToMatch), ';j')" role="warn">
                The following words should not be added in the topic: <sch:value-of select="$words"/>
            </sch:report>
        </sch:rule>
    </sch:pattern></sch:schema>