Search code examples
pythonregexregex-group

How to create a list of tuples containing multiple Regular Expression


So I am currently working on an assignment requiring us to extract phone numbers, emails, and websites from a text document. The lecturer required us to output it into a list of tuples, each of them contains the initial index, the length, and the match. Here are some examples: [(1,10,'0909900008'), (35,16,'contact@viva.com')], ... Since there are three different requirements to achieve. How can I put all of them into a list of tuples? I have thought of the three regex expressions, but I can't really put all of them together in 1 list. Should I create a new expression to describe all three? Thanks for your help.

result = []

# Match with RE
email_pattern = r'[\w\.-]+@[\w\.-]+(?:\.[\w]+)+'
email = re.findall(email_pattern, string)
for match in re.finditer(email_pattern, string):
    print(match.start(), match.end() - match.start(), match.group())

phone_pattern = r'\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}'
phone = re.findall(phone_pattern, string)
for match in re.finditer(phone_pattern, string):
    print(match.start(), match.end() - match.start(), match.group())

website_pattern = '(https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|www\.[a-zA-Z0-9][a-zA-Z0-9-]+[a-zA-Z0-9]\.[^\s]{2,}|https?:\/\/(?:www\.|(?!www))[a-zA-Z0-9]+\.[^\s]{2,}|www\.[a-zA-Z0-9]+\.[^\s]{2,})'
web = re.findall(website_pattern, string)
for match in re.finditer(website_pattern, string):
    print(match.start(), match.end() - match.start(), match.group())

My output:

# Text document
should we use regex more often? let me know at 012345678@student.eng or bbx@gmail.com. To further notice, contact Khoi at 0957507468 or accessing
https://web.de or maybe www.google.com, or Mr.Q at 0912299922.

# Output
47 21 012345678@student.eng
72 13 bbx@gmail.com
122 10 0957507468
197 10 0912299922
146 14 https://web.de
170 15 www.google.com,


Solution

  • Rather than printing do appending to result list then print it, i.e. change

    print(match.start(), match.end() - match.start(), match.group())
    

    to

    result.append((match.start(), match.end() - match.start(), match.group()))
    

    and same way for others, then at end

    print(result)