So I have several examples of raw text in which I have to extract the characters after 'Terms'. The common pattern I see is after the word 'Terms' there is a '\n' and also at the end '\n' I want to extract all the characters(words, numbers, symbols) present between these to \n but after keyword 'Terms'.
Some examples of text are given below:
1) \nTERMS \nDirect deposit; Routing #256078514, acct. #160935\n\n'
2) \nTerms\nDue on receipt\nDue Date\n1/31/2021
3) \nTERMS: \nNET 30 DAYS\n
The code I have written is given below:
def get_term_regex(s):
raw_text = s
term_regex1 = r'(TERMS\s*\\n(.*?)\\n)'
try:
if ('TERMS' or 'Terms') in raw_text:
pattern1 = re.search(term_regex1,raw_text)
#print(pattern1)
return pattern1
except:
pass
But I am not getting any output, as there is no match.
The expected output is:
1) Direct deposit; Routing #256078514, acct. #160935
2) Due on receipt
3) NET 30 DAYS
Any help would be really appreciated.
Try the following:
import re
text = '''1) \nTERMS \nDirect deposit; Routing #256078514, acct. #160935\n\n'
2) \nTerms\nDue on receipt\nDue Date\n1/31/2021
3) \nTERMS: \nNET 30 DAYS\n''' # \n are real new lines
for m in re.finditer(r'(TERMS|Terms)\W*\n(.*?)\n', text):
print(m.group(2))
Note that your regex could not deal with the third 'line' because there is a colon :
after TERMS
. So I replaced \s
with \W
.
('TERMS' or 'Terms') in raw_text
might not be what you want. It does not raise a syntax error, but it is just the same as 'TERMS' in raw_text
; when python evaluates the parenthesis part, both 'TERMS'
and 'Terms'
are all truthy, and therefore python just takes the last truthy value, i.e., 'Terms'
. The result is, TERMS
cannot be picked up by that part!
So you might instead want someting like ('TERMS' in raw_text) or ('Terms' in raw_text)
, although it is quite verbose.