Search code examples
pythonpython-re

Regular expression for searching only natural numbers


It is necessary to write a regular expression to search for natural numbers in the text. Numbers can be inside words and any special characters. The main condition for the search is a sequence of digits that is not preceded by + and -.

I wrote something like that:

text = "lOngPa$$W0Rd 2342 +4324423 -4234234 fsdf 4 fdsfsr +45 frwr gfdg42f"
match = re.findall(r'[^+-]\d+', text)
print(match)

In my understanding, the expression is looking for a sequence of digits (1 and more) in which there is no + and -

Output:
['W0', ' 2342', '4324423', '4234234', ' 4', '45', 'g42']

Where do the spaces and letters in the output come from?


Solution

  • Note that [^+-] matches exactly 1 character which is not + or -, so it matches a letter, a space, etc. Hence you are getting 'W0' and ' 2342', for example.

    What you need instead is negative lookbehind, see Regular Expression Syntax:

    text = "lOngPa$$W0Rd 2342 +4324423 -4234234 fsdf 4 fdsfsr +45 frwr gfdg42f"
    match = re.findall(r'(?<![+-])\d+', text)
    print(match)
    # ['0', '2342', '324423', '234234', '4', '5', '42']
    

    However, the requirements probably meant something like this code below instead, where the longest sequence of digits cannot be preceded by + or - signs. Note that I also converted the strings into integers (which, I assume, is what is needed):

    # ['0', '2342', '4', '42']
    text = "lOngPa$$W0Rd 2342 +4324423 -4234234 fsdf 4 fdsfsr +45 frwr gfdg42f"
    natural_numbers = [int(n) for n in re.findall(r'(?<![\d+-])\d+', text)]
    print(natural_numbers)
    # [0, 2342, 4, 42]