Search code examples
regexvb.netregex-lookarounds

Regex expression to match strings after example codes


Hello I'm trying to make regex expression to match text after code example of regex that I have at the moment with sample text.

(?<=H\d\d\d)([.*\w\W\n]+)(?=End word)

https://regex101.com/r/9SBnH9/2

Sample text
Sample text
Sample text
H319 asdkjczixuqweoiurqoiweqwrjasdkjfqwe qweiouqwroiu kjasdkj czkjxklqjwekjiouasdiiaosudou
oiuasodiucxzlkjqweoiu oqiwur H320 asdkqjwe askjdq xzc
H325 asjdhasjd zxcjh
H331+H341+H341 askdjvkjzx qweqrqwoe
End word
Sample text
Sample text

This is the sample text and I want expression to start searching for matches after find H** code and get the text only when find again H** code ignore it and take the text and have example with H**+H**+H** or H**+H**, but same deal skip H codes and get the text only. And search until it finds End word. I'm to the point where it start from first H code, but then it get all the string and end's on end word you can see that in regex site which I send.

I should get this as result:

asdkjczixuqweoiurqoiweqwrjasdkjfqwe qweiouqwroiu kjasdkj czkjxklqjwekjiouasdiiaosudou oiuasodiucxzlkjqweoiu oqiwur.asdkqjwe askjdq xzc.asjdhasjd zxcjh.askdjvkjzx qweqrqwoe


Solution

  • You can match all chars in between that are not directly followed by H and 3 digits and assert End word at the right at the start of the string.

    (?<=H\d{3}\b)(?:(?!\+?H\d{3}\b)[\S\s])+(?=[\s\S]*\r?\nEnd word\b)
    

    The pattern in parts matches:

    • (?<=H\d{3}\b) Positive lookbehind, assert H and 3 digits directly to the left. The word boundary \b prevents a partial match
    • (?: Non capture group
      • (?!\+?H\d{3}\b)[\S\s] Match 1+ times any char (including newlines) that are not directly followed by an optional + then H and 3 digits
    • )+ Close non capture group and repeat 1+ times
    • (?= Positive lookahead, assert to the right
      • [\s\S]*\r?\nEnd word\b Match End word at the start of the string (or leave out the ^ if it is not at the start)
    • ) Close lookahead

    Regex demo

    If you also don't want to cross matching End word, you can add that to the negative lookahead:

    (?<=H\d{3}\b)(?:(?!\+?H\d{3}\b|^End word\b)[\S\s])+(?=[\s\S]*\r?\nEnd word\b)
    

    Regex demo