Search code examples
regexregex-group

Regex custom matched tag ang get its group?


I'm trying to create custom markup. This markup will look like this; If there is no attribute then it will be <mark text mark> and the group of this match will be text. If this markup has an attribute <mark:attribute text mark> will be like this. After the <mark, there will be a colon without a space, and an attribute will come without a space. Two groups will be formed from this match, the first group will be the attribute value after the colon, the second group will be text.

Example

  • <mark text mark> must match
  • <mark:attribute text mark> must match
<mark
text
mark>

must match

<mark:attribute
text
mark>

must match


  • <marktextmark> should not match
  • <mark> should not match
  • <mark:attributetextmark> should not match
  • <mark:attribute textmark> should not match
  • <mark: text mark> should not match

  • <mark:red ...blah...blah... mark> must match. First group is red, Second group is ...blah...blah...
  • <mark Lorem Ipsum mark> must match. The group is Lorem Ipsum

I think it can make matching difficult when mark is capitalized <MARK TEXT MARK>. It doesn't matter if it doesn't affect the situation.

Summary

  • Must start with <mark
  • If there is an attribute, it should be written with a colon without spaces. <mark:attribute
  • Must end with mark>
  • There should be spaces before and after the text. <mark:attribute text mark> <mark text mark>
  • Must have only one group without attribute <mark text mark> Group: text
  • Must have two group with attribute <mark:attribute text mark> Group[0]: attribute, Group[1]: text
  • There should be no spaces after the colon, so the attribute value should not be empty.
  • And multi-line support.

I tried to write some regex (<mark:([^*].+?)mark>) but I couldn't get any result. I hope I was able to explain. https://regex101.com/r/jNsM88/1

Thanks for your help.


Solution

  • Group 0 is always the entire match, so captured groups start at 1: Your targets will be captured in groups 1 and 2 (not 0 and 1 as you desire).

    Use an optional (ie quantifier ?) non-capturing group ((?:...)) for the attribute and capture non-whitespace \S:

    <mark(?::(\S+))?\s+(\S+)\s+mark>
    

    See live demo.