The purpose is to keep cardinals and ordinals numbers at the beginning of the string as long as they are immediately before either word PERFORMANCE
or SCORE
:
#These numbers are kept:
100 SCORE FOR STUDENT
80 PERFORMANCE FOR TEACHER
However, if the numbers are at the start and the following word is different, then they should be removed:
#These numbers are removed
10095TH 10097TH 179TH SCHOOL ANIVERSARY
11 12 10 SECONDARY LEVELS
100 100 100 100 SCHOOL AGREEMENT
The issue I have is when before the word PERFORMANCE
or SCORE
there are digits separated by space:
#All numbers should be kept
3 10 100 PERFORMANCE
001 10 12345 SCORE
I am applying the following regex, but the last section is messy (?!\s*\d*\s*\d*\s*(?:PERFORMANCE|SCORE)\b)
because currently this is just considering 3 sets of numbers before PERFORMANCE
or SCORE
to be kept:
(?<=[A-Za-z]\b )([ 0-9]*(ST|[RN]D|TH)?\b)|^(([\d ]+(ST|[RN]D|TH)?)*\b)(?!\s*\d*\s*\d*\s*(?:PERFORMANCE|SCORE)\b)
The previous regex works for the following:
3 10 100 PERFORMANCE
001 10 12345 SCORE
But it will not work if I add an additional set of digits:
3 10 100 1 PERFORMANCE
001 10 1 12345 SCORE
How can I generalize this rule to include all the set of digits?
Thanks
Try the following:
^(?:\d+(?:ST|[RN]D|TH)?\s)+(?=[^\d]+$)(?!PERFORMANCE|SCORE)
^ anchor to beginning
(?: start non-capturing group
\d+ match one or more digits
(?:ST|[RN]D|TH)? optionally followed by one of your approved suffixes
\s then a whitespace
)+ one or more times
(?=[^\d]+$ assert that the rest of the line is number-free (forces the regex to not backtrack to the last number)
(?!PERFORMANCE|SCORE) assert that the following characters are NOT 'PERFORMANCE' or 'SCORE'