It is necessary to write a regular expression to search for natural numbers in the text. Numbers can be inside words and any special characters. The main condition for the search is a sequence of digits that is not preceded by + and -.
I wrote something like that:
text = "lOngPa$$W0Rd 2342 +4324423 -4234234 fsdf 4 fdsfsr +45 frwr gfdg42f"
match = re.findall(r'[^+-]\d+', text)
print(match)
In my understanding, the expression is looking for a sequence of digits (1 and more) in which there is no + and -
Output:
['W0', ' 2342', '4324423', '4234234', ' 4', '45', 'g42']
Where do the spaces and letters in the output come from?
Note that [^+-]
matches exactly 1 character which is not +
or -
, so it matches a letter, a space, etc. Hence you are getting 'W0'
and ' 2342'
, for example.
What you need instead is negative lookbehind, see Regular Expression Syntax:
text = "lOngPa$$W0Rd 2342 +4324423 -4234234 fsdf 4 fdsfsr +45 frwr gfdg42f"
match = re.findall(r'(?<![+-])\d+', text)
print(match)
# ['0', '2342', '324423', '234234', '4', '5', '42']
However, the requirements probably meant something like this code below instead, where the longest sequence of digits cannot be preceded by +
or -
signs. Note that I also converted the strings into integers (which, I assume, is what is needed):
# ['0', '2342', '4', '42']
text = "lOngPa$$W0Rd 2342 +4324423 -4234234 fsdf 4 fdsfsr +45 frwr gfdg42f"
natural_numbers = [int(n) for n in re.findall(r'(?<![\d+-])\d+', text)]
print(natural_numbers)
# [0, 2342, 4, 42]