I currently have a code working but the only flaw is that I did'nt set the regex python code in the optimal way.
The original text contains an amount in thousands hundred thousands and millions. With no decimal. I mean it has decimal but always ",00".
Example line in Text:
Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid
Right now with the following code is capturing millions fine but less than 100,000 is skipping one digit.
regex = r"(\d+).(\d+).(\d+),(\d+)"
match = re.search(regex, line, re.MULTILINE)
print = "$"+match.group(1)+match.group(2)+match.group(3)
It captures like this:
$5860
But target is like this:
$58610
If the amount is in millions it captures fine, I had to do it like that because the currency that I'm working with it has big amounts. So I constantly manage those kind of quantities.
Regards
You can use the following regex to extract your expected matches and remove the thousand separator afterwards:
\$\s?(\d{1,3}(?:\.\d{3})+)(?:,\d+)?(?!\d)
You need to get Group 1 value, remove periods from it and reappend $
at the start. See the regex demo. Details:
\$
- a $
char\s?
- an optional whitespace(\d{1,3}(?:\.\d{3})+)
- Group 1: one to three digits, and then one or more occurrences (since you only want to match thousands and more) of .
and three digits(?:,\d+)?
- an optional sequence of a comma and one or more digits(?!\d)
- no digit is allowed immediately on the right.See the Python demo:
import re
text = 'Debt 1 of 2 for an amount of: $ 58.610,00, Unpaid'
match = re.search(r'\$\s?(\d{1,3}(?:\.\d{3})+)(?:,\d+)?(?!\d)', text)
if match:
print(f"${match.group(1).replace('.', '')}")
# => $58610