Search code examples
uimaruta

Need to find empty passage-uima ruta


I need to annotate empty passage in a document.I used regex pattern to annotate.But it also covering the non-emptypassage

Sample Input file:

<p class="MsoNormal"><a name="para10001">You can easily change the formatting</a></p>
<p class="MsoNormal"><a name="para10002"> </a></p>
<p class="MsoNormal"><a name="para10003"></a></p>
<p class="MsoNormal"><a name="para10004">To change the overall look of your document</a></p>
<p class="MsoNormal"><a name="para10005"></a></p>
<p class="MsoNormal"><a name="para10006"></a></p>

Ruta Script:

   "<p(.*?)><a name=\"para(\\d+)\"></a></p>"->EMPTYPASSAGE;
   "<p(.*?)><a name=\"para(\\d+)\"> </a></p>"->EMPTYPASSAGE;
                         or
   "<p(.*?)><a name=\"para(.+?)\"></a></p>"->EMPTYPASSAGE;
   "<p(.*?)><a name=\"para(.+?)\"> </a></p>"->EMPTYPASSAGE;

Solution

  • Your regex consumes several <p> tags. Try something like:

    "<p([^>]*?)><a name=\"para(\\d+)\"></a></p>"->EMPTYPASSAGE;
    "<p([^>]*?)><a name=\"para(\\d+)\"> </a></p>"->EMPTYPASSAGE;