I need to annotate empty passage in a document.I used regex pattern to annotate.But it also covering the non-emptypassage
Sample Input file:
<p class="MsoNormal"><a name="para10001">You can easily change the formatting</a></p>
<p class="MsoNormal"><a name="para10002"> </a></p>
<p class="MsoNormal"><a name="para10003"></a></p>
<p class="MsoNormal"><a name="para10004">To change the overall look of your document</a></p>
<p class="MsoNormal"><a name="para10005"></a></p>
<p class="MsoNormal"><a name="para10006"></a></p>
Ruta Script:
"<p(.*?)><a name=\"para(\\d+)\"></a></p>"->EMPTYPASSAGE;
"<p(.*?)><a name=\"para(\\d+)\"> </a></p>"->EMPTYPASSAGE;
or
"<p(.*?)><a name=\"para(.+?)\"></a></p>"->EMPTYPASSAGE;
"<p(.*?)><a name=\"para(.+?)\"> </a></p>"->EMPTYPASSAGE;
Your regex consumes several <p>
tags. Try something like:
"<p([^>]*?)><a name=\"para(\\d+)\"></a></p>"->EMPTYPASSAGE;
"<p([^>]*?)><a name=\"para(\\d+)\"> </a></p>"->EMPTYPASSAGE;