Search code examples
regexregex-lookarounds

Capture a word `+` same word again but with a prefix


To all the Regex gurus

Any idea how to handle this beast

string = 'Position_Name [+|-|/|*] PrevYear Position_Name'

Looking for the Regex to match the occurrences of Position_Name (basically twice similar to a duplicate) but not really a dupe since it is followed by a special character and then by itself BUT with some prefix - here: 'PrevYear'. Means Position_Name is dynamic and could be any word (eg Profit, Sales, etc) but PrevYear will stay constant.

So how could I identify these lines where there's a position being mentioned twice with some math symbol in the middle (for now) and then capture those three elements since the plus could also be a / (divided by), a minus sign - or a multiply * as intended to be represented by [+|-|/|*] in my example.

PS: I do not mind programming this in two steps ... so first matching and then capturing - but still would need the regex to find these little gems (in hundreds of lines).

Elegantly finding dupes is not the problem eg via \b(\w+) \1\b but I have come to realize my capabilities are not sufficient for that combo.

Thanks on hints and support.


Solution

  • You can use

    \b(\w+)\b\s*[-+/*]\s*PrevYear\s*\1\b
    

    See the regex demo. Details

    • \b - a word boundary
    • (\w+) - Group 1: one or more word chars
    • \b - a word boundary
    • \s*[-+/*]\s* - a -, +, / or * enclosed with zero or more whitespaces
    • PrevYear - a fixed word
    • \s* - zero or more whitespaces
    • \1 - same value as captured in Group 1
    • \b - a word boundary.