I'm trying to parse some specific text for like below. I have tried to use python re with
r'[A-Z]{5}[A-Z0-9]{2}'
expression but this is giving me unwanted text also. Please see below for the expected output.
Conditions:
Given String:
"DHKGNC1, DHDHK32, DHKGN1K, SOME, GARBAGE, TEXT"
Expected output: ['DHKGNC1', 'DHDHK32', 'DHKGN1K']
Actual output: ['DHKGNC1', 'DHDHK32', 'DHKGN1K', 'GARBAGE']
Don't use [A-Z0-9]{2}
, use ([A-Z0-9][0-9])|([0-9][A-Z0-9])
That is, one or the other has to be a digit.
re.findall(r'([A-Z]{5}(?:(?:[A-Z0-9][0-9])|(?:[0-9][A-Z0-9])))', "DHKGNC1, DHDHK32, DHKGN1K, SOME, GARBAGE, TEXT")
['DHKGNC1', 'DHDHK32', 'DHKGN1K']