I'm trying to write a regular expression in python 3.4 that will take the input from a text file of potential prices and match for valid formatting.
The requirements are that the price be in $X.YY or $X format where X must be greater than 0.
Invalid formats include $0.YY, $.YY, $X.Y, $X.YYY
So far this is what I have:
import re
from sys import argv
FILE = 1
file = open(argv[FILE], 'r')
string = file.read()
file.close()
price = re.compile(r""" # beginning of string
(\$ # dollar sign
[1-9] # first digit must be non-zero
\d * ) # followed by 0 or more digits
(\. # optional cent portion
\d {2} # only 2 digits allowed for cents
)? # end of string""", re.X)
valid_prices = price.findall(string)
print(valid_prices)
This is the file I am using to test right now:
test.txt
$34.23 $23 $23.23 $2 $2313443.23 $3422342 $02394 $230.232 $232.2 $05.03
Current output:
$[('$34', '.23'), ('$23', ''), ('$23', '.23'), ('$2', ''), ('$2313443', '.23'), ('$3422342', ''), ('$230', '.23'), ('$232', '')]
It is currently matching $230.232 and $232.2 when these should be rejected.
I am separating the dollar portion and the cent portion into different groups to do further processing later on. That is why my output is a list of tuples.
One catch here is that I do not know what deliminator, if any, will be used in the input file.
I am new to regular expressions and would really appreciate some help. Thank you!
If it's really not clear, which delimeter will be used, to me it would only make sense to check for "not a digit and not a dot" as delimeter:
\$[1-9]\d*(\.\d\d)?(?![\d.])