I have a list of strings. If any word in the list matches inside a line within a document,
I want to get as output the matching word and a number which will be present in the line, mostly after that matching word. The word and number is mostly separated by a space
or :
Example from document:
Expedien: 1-21-212-16-26
My list:
my_list = ['Reference', 'Ref.', 'tramite', 'Expedien']
The numbers inside the line for the matching string can be either separated by -
or maybe without.
Example: 1-21-22-45
or RE9833
In this case RE9833
should come entirely (not only the number) if a matching word from the list is found inside the line.
How to write a regex in python for this.
Input file:
$cat input_file
Expedien: 1-21-212-16-26 #other garbage
Reference RE9833 #tralala
abc
123
456
Ref.: UV1234
tramite 1234567
Ref.:
Sample:
import re
my_list = ['Reference', 'Ref.', 'tramite', 'Expedien']
#open the file as input
with open('input_file','r') as infile:
#create an empty dict to store the pairs
#that we will extract from the file
res = dict()
#for each input line
for line in infile:
#the only place we will use regex in this code
#we split the input strings in a list of strings using
#as separator : if present followed by some spaces
elems = re.split('(?::)?\s+', line)
#we test that we have at least 2 elements
#if not we continue with the following line
if len(elems) >= 2 :
contains = False
#tmp will store all the keys identfied
tmp = ''
#we go through all the strings present in this list of strings
for elem in elems:
#when we enter this if we have already found the key and we have the value
#at this iteration
if contains:
#we store it in the dict
#reset the check and leave this loop
res.update({tmp : elem})
contains = False
break
#we check if the elem is in my_list
if elem in my_list:
#if this is the case
#we set contains to true and we save the key in tmp
contains = True
tmp = elem
print(res)
output:
python find_list.py
{'tramite': '1234567', 'Reference': 'RE9833', 'Expedien': '1-21-212-16-26', 'Ref.': ''}
Regex demo: https://regex101.com/r/kSmLzW/3/