I'd like to match number, positive or negative, possibly with currency sign in front. But I don't want something like PSM-9. My code is:
test='AAA PCSK-9, $111 -3,33'
re.findall(r'\b-?[$€£]?-?\d+[\d,.]*\b', test)
Output is:['-9', '111', '3,33']
Could someone explain why -9 is matched? Thank you in advance.
Edit: I don't any part of PCSK-9 is matched it is like a name of a product rather a number. So my desired output is:
['111', '3,33']
The word boundary matches between the K and the dash. The 2 parts after the dash [$€£]?-?
are optional because of the questionmark and then you match one or more times a digit. This results in the match -9
What you might use instead of a word boundary is an assertion that checks if what is before and after the match is not a non whitespace character \S
using a negative lookbehind and a negative lookahead.
(?<!\S)-?[$€£]?(\d+(?:[,.]\d+)?)(?!\S)