I cannot figure our how to write rules in which tokens involve a combination of words and punctuation.
How should I handle punctuation in LanguageTool
rules?
I have looked online and tried a couple of things to no avail.
For instance, both 1)
<rule id="THAT_AND_THAN_DOT" name="that and than dot">
<pattern>
<token>that</token>
<token regexp="yes">
another.|
himself.|
herself.|
itself.</token>
</pattern>
<message>Did you mean <suggestion>than \2.</suggestion>?</message>
<example correction='than another.'>Yes, better <marker>than another. </marker></example>
</rule>
2)
<rule id="THAT_AND_THAN_DOT" name="that and than dot">
<pattern>
<token>that</token>
<token regexp="yes">
another|
himself|
herself|
itself</token>
<token regexp="yes">
[.]</token>
</pattern>
<message>Did you mean <suggestion>than \2.</suggestion>?</message>
<example correction='than another.'>Yes, better <marker>than another. </marker></example>
</rule>
and 3)
<rule id="THAT_AND_THAN_DOT" name="that and than dot">
<pattern>
<token>that</token>
<token regexp="yes">
another|
himself|
herself|
itself</token>
<token regexp="yes">
[:punct:]</token>
</pattern>
<message>Did you mean <suggestion>than \2.</suggestion>?</message>
<example correction='than another.'>Yes, better <marker>than another. </marker></example>
</rule>
failed. On the other hand
<rule id="THAT_AND_THAN_DOT" name="that and than dot">
<pattern>
<token>that</token>
<token regexp="yes">
another|
himself|
herself|
itself</token>
</pattern>
<message>Did you mean <suggestion>than \2.</suggestion>?</message>
<example correction='than another.'>Yes, better <marker>than another. </marker></example>
</rule>
works, albeit without accounting for the dot, which I would like to do.
Note : I am using LanguageTool
inside Texstudio
.
Your code in 2) almost works, only that you have a token that
in the pattern but than
in your example sentence, so it will never match, independent of the punctuation. In general, punctuation gets its own token so it needs to have its own token in the pattern, too. You can test your rules with http://community.languagetool.org/ruleEditor/expert, it will also show a message with the applied tokenization in case of problems.