Search code examples
regextext-segmentation

How to set SRX rule to break both before and after a character


I am updating an SRX rules file using the SRX specification www.ttt.org/oscarstandards/srx/srx10.html

There is no explicit mention of how to make a break both before and after a certain text.

In a document the bullet character \u2022 appears and in needs to be in its own segment, thus there needs to be a break both before and after.

The only solution I came up with is:

<rule break="yes">
    <afterbreak>\u2022</afterbreak>
</rule>
<rule break="yes">
    <beforebreak>\u2022</beforebreak>
</rule>

Is this a correct syntax?


Solution

  • As per the 1.2. Regular Expressions section, 1.2.1. Metacharacters table:

    \uhhhh    Match the character with the hex value hhhh.
    ...
    \x{hhhh}    Match the character with hex value hhhh
    \xhh    Match the character with two digit hex value hh

    You may use any of the three notations, but I guess you may just keep you SRX rules as it.