Search code examples
pythonregexfilecurrency

Python Regular Expression to match specific currency format


I'm trying to write a regular expression in python 3.4 that will take the input from a text file of potential prices and match for valid formatting.

The requirements are that the price be in $X.YY or $X format where X must be greater than 0.

Invalid formats include $0.YY, $.YY, $X.Y, $X.YYY

So far this is what I have:

import re
from sys import argv

FILE = 1

file = open(argv[FILE], 'r')
string = file.read()
file.close()

price = re.compile(r"""         # beginning of string
                       (\$      # dollar sign
                       [1-9]    # first digit must be non-zero
                       \d * )   # followed by 0 or more digits
                       (\.       # optional cent portion
                       \d {2}  # only 2 digits allowed for cents
                         )?     # end of string""", re.X)

valid_prices = price.findall(string)
print(valid_prices)

This is the file I am using to test right now:

test.txt

 $34.23 $23 $23.23 $2 $2313443.23 $3422342 $02394 $230.232 $232.2 $05.03

Current output:

$[('$34', '.23'), ('$23', ''), ('$23', '.23'), ('$2', ''), ('$2313443', '.23'), ('$3422342', ''), ('$230', '.23'), ('$232', '')]

It is currently matching $230.232 and $232.2 when these should be rejected.

I am separating the dollar portion and the cent portion into different groups to do further processing later on. That is why my output is a list of tuples.

One catch here is that I do not know what deliminator, if any, will be used in the input file.

I am new to regular expressions and would really appreciate some help. Thank you!


Solution

  • If it's really not clear, which delimeter will be used, to me it would only make sense to check for "not a digit and not a dot" as delimeter:

    \$[1-9]\d*(\.\d\d)?(?![\d.])
    

    https://regex101.com/r/jH2dN5/1