Search code examples
pythonregexmatchingpython-re

How do I capture a price with thousand and decimal separator with regex in python


I currently have a code working but the only flaw is that I did'nt set the regex python code in the optimal way.

The original text contains an amount in thousands hundred thousands and millions. With no decimal. I mean it has decimal but always ",00".

Example line in Text:

Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid

Right now with the following code is capturing millions fine but less than 100,000 is skipping one digit.

regex = r"(\d+).(\d+).(\d+),(\d+)"
            match = re.search(regex, line, re.MULTILINE)
            print = "$"+match.group(1)+match.group(2)+match.group(3)

It captures like this:

$5860

But target is like this:

$58610

If the amount is in millions it captures fine, I had to do it like that because the currency that I'm working with it has big amounts. So I constantly manage those kind of quantities.

Regards


Solution

  • You can use the following regex to extract your expected matches and remove the thousand separator afterwards:

    \$\s?(\d{1,3}(?:\.\d{3})+)(?:,\d+)?(?!\d)
    

    You need to get Group 1 value, remove periods from it and reappend $ at the start. See the regex demo. Details:

    • \$ - a $ char
    • \s? - an optional whitespace
    • (\d{1,3}(?:\.\d{3})+) - Group 1: one to three digits, and then one or more occurrences (since you only want to match thousands and more) of . and three digits
    • (?:,\d+)? - an optional sequence of a comma and one or more digits
    • (?!\d) - no digit is allowed immediately on the right.

    See the Python demo:

    import re
    text = 'Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid'
    match = re.search(r'\$\s?(\d{1,3}(?:\.\d{3})+)(?:,\d+)?(?!\d)', text)
    if match:
        print(f"${match.group(1).replace('.', '')}")
    
    # => $58610