I am trying to implement a negative-lookahead for my task.
I have to add kgs
into a negative-lookahead after numeric part.
So far I have tried this regex:
total\samount\s?\:?\s?[0-9\,\.]+\s(?!kgs)(?!\ kgs)
Given the text:
task1. total amount 5,887.99 kgs
task2. total amount 5,887.99kgs
task3. total amount 5,887.99 usd
task4. total amount 5,887.99usd
I want to match task3 and task4 but not task1 and task2.
So far I am able to reject task1/task2 and match task3 but failing to match task4.
You may emulate an atomic group that Python re
does not support.
For that purpose, you may use
total\s+amount\s*(?::\s*)?(?=(\d[\d,.]*))\1(?!\s*kgs)
See the regex demo
Details
total\s+amount
- total
, 1+ whitespaces, amount
\s*
- 0+ whitespaces(?::\s*)?
- an optional group matching 1 or 0 occurrences of :
and 0+ whitespaces(?=(\d[\d,.]*))
- a positive lookahead that matches and captures into Group 1 a digit and then 0 or more digits, dots or commas\1
- the value of the capturing group #1 (nobacktracking is allowed into a backreference, thus the subsequent lookahead will only be triggered once and if it fails, the whole match will fail)(?!\s*kgs)
- a negative lookahead that fails the match if there are 0+ whitespaces and then kgs
immediately to the right of the current location.In Python, use
pattern = r'total\s+amount\s*(?::\s*)?(?=(\d[\d,.]*))\1(?!\s*kgs)'
NOTE: With PyPi regex module that supports atomic groups and possessive quantifiers, you may just use
total\s+amount\s*(?::\s*)?\d[\d,.]*+(?!\s*kgs)
# ^^
See the regex demo (PHP option is set since this will have the same behavior in Python code).
The *+
0 or more quantifier is posessive, once the digits, commas and dots are matched, the pattern will never be retried and the negative lookahead check will be only performed once.
import regex, re
texts = ['task1. total amount 5,887.99 kgs','task2. total amount 5,887.99kgs','task3. total amount 5,887.99 usd','task4. total amount 5,887.99usd']
re_rx = r'total\s+amount\s*(?::\s*)?(?=(\d[\d,.]*))\1(?!\s*kgs)'
regex_rx = r'total\s+amount\s*(?::\s*)?\d[\d,.]*+(?!\s*kgs)'
for s in texts:
m_rx = re.search(re_rx, s)
if m_rx:
print("'", m_rx.group(), "' matched in '", s,"' with re pattern", sep="")
m_regex = regex.search(regex_rx, s)
if m_regex:
print("'", m_regex.group(), "' matched in '", s,"' with regex pattern", sep="")
Output:
'total amount 5,887.99' matched in 'task3. total amount 5,887.99 usd' with re pattern
'total amount 5,887.99' matched in 'task3. total amount 5,887.99 usd' with regex pattern
'total amount 5,887.99' matched in 'task4. total amount 5,887.99usd' with re pattern
'total amount 5,887.99' matched in 'task4. total amount 5,887.99usd' with regex pattern