Search code examples
regexnotepad++regex-lookaroundsregex-group

Regex to find a multi line string that includes another string between lines


my first Q here.

I have a log file that has multiple similar strings as hits:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: US
OnlineID: Cu128yi
---Start---
KINGDOM HEARTS HD 1.5 +2.5 ReMIX
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---

Region: US
OnlineID: CAJ5Y
---Start---
Madden NFL 18: G.O.A.T. Super Bowl Edition
---END---

I wanna find all hits which contain fifa (fifa as a string). Fifa is example, I need to find all hits which contain some strings.

The last thing I could find is this regex: (?s)(?=^\r\n)(.*?)(fifa)(.*?)(?=\r\n\r\n)

But when I use this, it selects all hits including hits with no fifa, until it finds a fifa in a hit, so it selects more than 1 hit sometimes like this.

Second problem is I can't use .* in (fifa) bcz it causes wrong selection.

What can I do now?

The right output should be like this:

Region: AR
OnlineID: Atl_Tuc
---Start---
FIFA 18 Legacy Edition
---END---

Region: FR
OnlineID: jubtrrzz
---Start---
FIFA 19
Undertale
Pro Evolution Soccer™ 2018
---END---

Region: RO
OnlineID: Se116
---Start---
Real Farm
EA SPORTS™ FIFA 20
LittleBigPlanet™ 3
---END---

Solution

  • You can use

    (?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?\bfifa\b.*?(?=\R{2}|\z)
    

    See the regex demo

    Details

    • (?si) - s makes . match line break chara (same as . matches newline ON) and case insensitive matching ON
    • (?:^(?<!.)|\R{2}) - matches start of a file or two line break sequences
    • \K - omits the matched line breaks
    • (?:(?!\R{2}).)*? - any char, 0 or more occurrences but as few as possible, not starting a double line break sequence
    • \bfifa\b - whole word fifa
    • .*? - any 0+ chars as few as possible
    • (?=\R{2}|\z) - up to the double line break or end of file.

    Now, if you want to match a paragraph with fifa and then 20 on some of its line, use

    (?si)(?:^(?<!.)|\R{2})\K(?:(?!\R{2}).)*?(?-s:\bfifa\b.*\b20\b).*?(?=\R{2}|\z)
    

    The (?-s:\bfifa\b.*\b20\b) is a modifier group where . stops matching line breaks, and it matches a whole word fifa, then any 0+ chars other than line break chars, as many as possible, and then a 20 as a whole word.

    See this regex demo.