Hey guys I am pretty new to using regexp. So please bear with me if some of my questions are very basic. I need to filter a large file for some data. A part of the data file looks like this
<abcd.....z>
<xyz123....etc..etc/>
<xyz123....etc..etc/>
.
.
many more
.
<xyz123....etc..etc/>
</node>
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z>
<xyz123....etc..etc/>
</node>
<abcd.....z/>
and this pattern repeats multiple times.
My requirement is to get the first <abcd.....z>
above </node>
along with the data in between them ( i.e all <xyz123....etc..etc/>
).
For e.g output 1:
<abcd.....z>
<xyz123....etc..etc/>
<xyz123....etc..etc/>
.
.
many more
.
<xyz123....etc..etc/>
output 2:
<abcd.....z>
<xyz123....etc..etc/>
I have used this positive look ahead operator
<abcd.*?>(?=</node>)
But the main problem with this is that the output includes <abcd.....z>
which do not have <xyz123....etc..etc/>
underneath them.
i.e the output is as follows
output 1:
<abcd.....z>
<xyz123....etc..etc>
<xyz123....etc..etc>
.
.
many more
.
<xyz123....etc..etc>
output 2:
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z>
<xyz123....etc..etc>
</node>
If you notice in output 2 i do not need the first 4 <abcd.....z/>
's . I need only the last one ie. the output has to be
<abcd.....z>
<xyz123....etc..etc>
Again sorry for the long post and hope someone can help me out here!!
Thnaks to @digitalLink i have the right expression.
It is (<abcd\.*z>(.|\n)*?)(?=<\/node>)
This works fine with online tools like regexr and regexp101. I noticed that both tools use a g-modifier (global modifier) at the end of the reg expression. I understand that /g expression flag reatins the index of the last match, allowing iterative searches.
Is this possible in matlab? Do I have to use g-modifier explicitly in matlab? What is the equivalent expression flag in matlab
Could someone please help me out here. I am new to matlab and not able to figure out this!!
I think this Regex expression may be what you're looking for. Let me know how it works.
"(<abcd\.*z>(.|\n)*?)(?=<\/node>)"
Make sure to include the multiline flag.
Test it here: http://regexr.com/3c0aj