Search code examples
regexsubstring

Which regex can I use to find exactly two sub-strings in a longer string?


I am completely new to RegEx and I am struggling even with a simple case.

I would like to identify the following cases, for example:

IR!GBP!INDEX.GBP
IR!GBP!INDEX.USD

where the sub-string GBP (or USD) appears exactly twice in the bigger string. The second time, it can appear only after the INDEX. sub-string. How can I detect it via RegEx?

The big string is formed as follows: the first bit is always "IR", the second is a currency, the third another string, all of them separated by the exclamation mark.

For example, IR!GBP!COUNTERPARTY.USD or IR!USD!INDEX.GBP should not return a match.

I hope my question is clear, and thanks a lot in advance for your help!

I tried various combinations with [a-zA-Z] and +? but ended up nowhere. I admit my inability!


Solution

  • To me it seems like the following ticks your boxes:

    ^IR!([A-Z]+)![A-Z]+\.\1$
    

    See an online demo. The pattern means:

    • ^IR! - Match start of string followed by exactly uppercase 'IR' and an exclamation mark;
    • ([A-Z]+) - A 1st capture group to hold the content of 1+ uppercase characters (your currency);
    • ![A-Z]+\. - A literal '!' before 1+ uppercase chars and a literal (escaped) dot;
    • \1 - Match the content of what was captured in the 1st group;
    • $ - End-line anchor.