Search code examples
regexpcrelookbehind

solving regex with positive lookbehind


Regexp problem. I'd like to have the first four strings below matching. Output should be the 3 characters between _ and . only.

Therefore these will match:

_20101_Bp16tt20_KG2.asc _201_Bondp0_KGB.ASC _2011_rndiep16tt20_232.AsC _20101_odiep16tt20_ab3.ASC

and should return respectively KG2, KGB, 232, ab3.

And these will not match:

_2_ordep16tt.asc __Bndt20_pippo_K.asc

I am able to select the whole block _KG2.asc, by doing ((?<=_)(...)(\.(?i)(asc))). However, I just want KG2. I think I should apply a positive lookbehind, but my tries all failed. Could you help me?


Solution

  • You could make use of \K and a positive lookahead:

    _\K[A-Za-z0-9]{3}(?=\.(?i)asc$)

    Regex demo

    That would match

    • _ Match literally
    • \K Forget previous match
    • [A-Za-z0-9]{3} Match 3 times an upper/lower case character or a digit (Replace with a dot if you want to match any character)
    • (?=\.(?i)asc$) Positive lookahead to assert that what follows is a dot and asc in lower or uppercase and assert the end of the string