Search code examples
regexmatlabmatlab-deploymentmatlab-compiler

Multiline regexp in matlab


Hey guys I am pretty new to using regexp. So please bear with me if some of my questions are very basic. I need to filter a large file for some data. A part of the data file looks like this

<abcd.....z>
    <xyz123....etc..etc/>
    <xyz123....etc..etc/>
    .
    .
    many more
    .
    <xyz123....etc..etc/>
</node>
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z>
    <xyz123....etc..etc/>
</node>
<abcd.....z/>

and this pattern repeats multiple times.

My requirement is to get the first <abcd.....z> above </node> along with the data in between them ( i.e all <xyz123....etc..etc/>).

For e.g output 1:

<abcd.....z>
    <xyz123....etc..etc/>
    <xyz123....etc..etc/>
    .
    .
    many more
    .
    <xyz123....etc..etc/>

output 2:

<abcd.....z>
    <xyz123....etc..etc/>

I have used this positive look ahead operator

<abcd.*?>(?=</node>)

But the main problem with this is that the output includes <abcd.....z> which do not have <xyz123....etc..etc/> underneath them. i.e the output is as follows output 1:

<abcd.....z>
    <xyz123....etc..etc>
    <xyz123....etc..etc>
    .
    .
    many more
    .
    <xyz123....etc..etc>

output 2:

<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z/>
<abcd.....z>
    <xyz123....etc..etc>
</node>

If you notice in output 2 i do not need the first 4 <abcd.....z/>'s . I need only the last one ie. the output has to be

<abcd.....z>
    <xyz123....etc..etc>

Again sorry for the long post and hope someone can help me out here!!

Thnaks to @digitalLink i have the right expression.
It is (<abcd\.*z>(.|\n)*?)(?=<\/node>)

This works fine with online tools like regexr and regexp101. I noticed that both tools use a g-modifier (global modifier) at the end of the reg expression. I understand that /g expression flag reatins the index of the last match, allowing iterative searches.

Is this possible in matlab? Do I have to use g-modifier explicitly in matlab? What is the equivalent expression flag in matlab

Could someone please help me out here. I am new to matlab and not able to figure out this!!


Solution

  • I think this Regex expression may be what you're looking for. Let me know how it works.

    "(<abcd\.*z>(.|\n)*?)(?=<\/node>)"
    

    Make sure to include the multiline flag.

    Test it here: http://regexr.com/3c0aj