Search code examples
regexregex-lookaroundsregex-groupregex-greedy

Not able to select the right data


I have been handed a legacy xml which is not going to change. In formatted way it looks like this:

<Result>
    <StepSequence>
      <RealMeasure>
        <Text value="Batman"/>
      </RealMeasure>
    </StepSequence>
    <StepSequence>
      <RealMeasure>
        <Text value="Superman"/>
      </RealMeasure>
    </StepSequence>
</Result>

Actually it comes like this:

<Result><StepSequence><RealMeasure><Text value="Batman"/></RealMeasure></StepSequence><StepSequence><RealMeasure><Text value="Superman"/></RealMeasure></StepSequence></Result>

Regex I have come up with is:

<RealMeasure><((\w*)\s+value="(.*)".*?)></RealMeasure>

But it is selecting data:

<RealMeasure><Text value="Batman"/></RealMeasure></StepSequence><StepSequence><RealMeasure><Text value="Superman"/></RealMeasure>

I want to select: <RealMeasure><Text value="Batman"/></RealMeasure>

and

<RealMeasure><Text value="Superman"/></RealMeasure>

I want to get groups so that I can later convert the match to something like: <RealMeasure type="Text" value="Superman"/>

using pattern like:

<RealMeasure type="$2" value=$3>

Link to online regex tester

Any tips to improve my regex?


Solution

  • Try this -

    let reg = /<RealMeasure><((\w+)\s+value="(.*?)".*?)><\/RealMeasure>/g;
    let str= `<Result><StepSequence><RealMeasure><Text value="Batman"/></RealMeasure></StepSequence><StepSequence><RealMeasure><Text value="Superman"/></RealMeasure></StepSequence></Result>`;
    str.replace(reg, `<RealMeasure type="$2" value="$3"/>`); //<Result><StepSequence><RealMeasure type="Text" value="Batman"/></StepSequence><StepSequence><RealMeasure type="Text" value="Superman"/></StepSequence></Result>
    

    The group value="(.*?)" has to be non-greedy as well. And changed the (\w*) to (\w+) to ensure that type is not empty.

    Also, / in </RealMeasure> has to be escaped like <\/RealMeasure>.