Search code examples

Regular expression to extract integers

I need help in extracting number from a column that store texts. In the text, there can be also some prices that I don't want to extract. As an example, if I have the following text:

text = "I have the following products 4526 and 4. The first one I paid $40 while the second one 30€. 
Here the link for the discount of 3.99: https://www.xysyffd.coom/7574@5757"

My expected result would be

[4526, 4]

Right now what I have used the following regular expression


which is able to discard the 3.99 but still it recognize the prices and the number in the link. Any suggestion on how to update the re?


  • Use


    See proof.


      (?<!                     look behind to see if there is not:
        \S                       non-whitespace (all but \n, \r, \t, \f,
                                 and " ")
      )                        end of look-behind
      [0-9]+                   any character of: '0' to '9' (1 or more
                               times (matching the most amount possible))
      (?!                      look ahead to see if there is not:
        \.                       '.'
        \d                       digits (0-9)
       |                        OR
        [^\s!?.]                 any character except: whitespace (\n,
                                 \r, \t, \f, and " "), '!', '?', '.'
      )                        end of look-ahead

    Python code:

    import re
    regex = r"(?<!\S)[0-9]+(?!\.\d|[^\s!?.])"
    test_str = "I have the following products 4526 and 4. The first one I paid $40 while the second one 30€. \nHere the link for the discount of 3.99: https://www.xysyffd.coom/7574@5757"
    matches = re.findall(regex, test_str)

    Results: ['4526', '4']