I know this question has been answered but my use case is slightly different. I am trying to setup a regex pattern to split a few strings into a list.
Input Strings:
1. "ABC-QWERT01"
2. "ABC-QWERT01DV"
3. "ABCQWER01"
Criteria of the string ABC - QWERT 01 DV 1 2 3 4 5
Expected Output
1. ['ABC','-','QWERT','01']
1. ['ABC','-','QWERT','01', 'DV']
1. ['ABC','QWER','01','DV']
I have tried the following patterns a bunch of different ways but I am missing something. My thought was start at the beginning of the string, split after the first three chars or the dash, then split on the occurrence of two decimals.
Pattern 1: r"([ -?, \d{2}])+"
This works but doesn't break up the string by the first three chars if the dash is missing
Pattern 2: r"([^[a-z]{3}, -?, \d{2}])+"
This fails as a non-pattern match, nothing gets split
Pattern 3: r"([^[a-z]{3}|-?, \d{2}])+"
This fails as a non-pattern match, nothing gets split
Any tips or suggestions?
You can use a pattern similar to :
(?i)([A-Z]{3})(-?)([A-Z]*)([0-9]{2})([A-Z]*)
import re
def _parts(s):
p = r'(?i)([A-Z]{3})(-?)([A-Z]*)([0-9]{2})([A-Z]*)'
return re.findall(p, s)
print(_parts('ABC-QWERT01DV'))
print(_parts('ABCQWER01'))
print(_parts('ABC-QWERT01'))
[('ABC', '-', 'QWERT', '01', 'DV')]
[('ABC', '', 'QWER', '01', '')]
[('ABC', '-', 'QWERT', '01', '')]
(?i)
: insensitive flag.([A-Z]{3})
: capture group 1 with any 3 letters.(-?)
: capture group 2 with an optional dash.([A-Z]*)
: capture group 3 with 0 or more letters.([0-9]{2})
: capture group 4 with 2 digits.([A-Z]*)
: capture group 5 with 0 or more letters.