Search code examples
c#xmlregexlookbehindnon-greedy

Returning only the first match using Regex Look-Behinds


Given the following XML document:

<root>
    <myGoodSection 
          some="attr" 
          another="attr" 
      />
    <myBadSection yet="anotherattr" />
</root>

How can I return the first /> using Regex? So far I've been able to get pretty close using the following expression:

(?ims)(?<=<myGoodSection.*?)/>

However, this will match every instance of /> that follows the first occurrence of <myGoodSection. I've also tried combining it with a negative look-behind in an effort to make the expression non-greedy, but it does not seem to have any effect:

(?ims)(?<=<myGoodSection.*?)(?<!/>)/>

Edit:

I am using a tool built on top of C# to handle the regex replacement. I do not have any control over how many matches I can use or not use like if I was using System.Text.RegularExpressions directly. I reference C# here to clarify the features that the engine I am using supports.

Yes, I am aware that as a matter of general practice I should not be using RegEx to parse XML. Let's just stipulate that given my current scope, requirements, and constraints that it is a perfectly acceptable solution (providing there's actually a way to accomplish it).


Solution

  • I was able to accomplish this by replacing . with \b[^>] so that my final expression becomes:

    (?ims)(?<=<myGoodSection\b[^>]*?)/>
    

    That will only match the closing /> as long as the prefix does not contain > anywhere, which will then exclude all of the tags following the first match.