Search code examples
pythonregexmatchalphanumeric

regex matching is unable to select alphanumeric string with spaces in python


I have the following list of expressions in python

LIST1=["AR BR_18_0138249",  "AR R_16_01382649",  "BR 16 0138264", "R 16 01382679" ]

In the above string a few patterns are alpha numeric but there is a space between the two second set of sequences. I expect the following output

  "AR BR_18_0138249"
  "AR R_16_01382649"
  "BR 16 0138264"
  "R 16 01382679" 

I have tried the following code

import regex as re
pattern = r"(\bB?R_\w+)(?!.*\1)|(\bB?R \w+)(?!.*\1)|(\bR?^sd \w+)(?!.*\1)"
for i in LIST1:
rest = re.search(pattern, i)
if rest:
    print(rest.group(1))

I have obtained the following result

BR_18_0138249
R_16_01382649
None
None

I am unable to get the sequences with the spaces. I request someone to guide me in this regard


Solution

  • You can use

    \b(B?R(?=([\s_]))(?:\2\d+)+)\b(?!.*\b\1\b)
    

    See the regex demo

    Details

    • \b - a word boundary
    • (B?R(?=([\s_]))(?:\2\d+)+) - Group 1: an optional B, then R, then one or more sequences of a whitespace or underscore followed with one or more digits (if you need to support letters here, replace \d+ with [^\W_])
    • \b - a word boundary
    • (?!.*\b\1\b) - a negative lookahead that fails the match if there are
      • .* - any zero or more chars other than line break chars, as many as possible
      • \b\1\b - the same value as in Group 1 matched as a whole word (not enclosed with letters, digits or underscores).

    See a Python re demo (you do not need the PyPi regex module here):

    import re
    LIST1=["AR BR_18_0138249",  "AR R_16_01382649",  "BR 16 0138264", "R 16 01382679" ]
    pattern = r"\b(B?R(?=([\s_]))(?:\2\d+)+)\b(?!.*\b\1\b)"
    for i in LIST1:
      rest = re.search(pattern, i)
      if rest:
        print(rest.group(1))