RegEx for three tricky string patterns

I want to find out an Instagram username from a profile page.

The thing is users choose how to address their usernames. (So, it is tricky to make computer get the pattern with RegEx)

All of patterns I want to search are shown below (user posts their Instagram username using one of them):

IG: @user-name
I.G.: @user-name
Instagram: @user-name

I thought of this logic below but I got completely lost searching in RegEx documentation or examples suitable for this search.

My logic: ignorecase (IG or I.G. or I.G or instagram) + (possible space) + (possible :) + (possible space) + (possible @) + (username with - or _ in it) + (ends with space or new line or full stop)

In a word, I'd like to select a word(username) after "instagram" or "IG" or "I.G" excluding unnecessary characters like ":", "@" or spaces.

How can I do this in RegEx? One-liner might be an efficient, yet elegant answer.

P.S. I want to do this with Python re.

Solution

My logic: ignorecase(IG or I.G. or I.G or instagram) + (possible space) + (possible :) + (possible space) + (possible @) + (username with - or _ in it) + (ends with space or new line or full stop)

First, on prefix part (IG and Instagram:). You can use re.I or re.IGNORECASE argument on re.compile function to ignore cases, on I.G and instagram. Then use the | or the or on regex terms.

r'(instagram|I\.*G\.*)'

Then escape the . and use the question mark ? which indicates that it can either have one or none, also on possible space \s and possible colon :.

prefix = re.compile(r'(instagram|I\.*G\.*)\s?:?', re.IGNORECASE)

And then for the username. First, use the question mark ? on @ to indicate that it is optional. Then the two (.*) are the first and last (if any) part of the username separated by either dash or underscore (-|_)? which is also optional. username = re.compile(r'@?(.)(-|_)?(.)\s?$') Placing it altogether:

username_regex = re.compile(r'^(instagram|I\.?G\.?)\s?:?\s?(@?.*((-|_).*)?\s?)$', re.IGNORECASE)

I've performed some tests for this regex, here is the code.

import re

username_regex = re.compile(r'^(instagram|I\.?G\.?)\s?:?\s?(@?.*((-|_).*)?\s?)$', re.IGNORECASE)

tests = [
    'I.G.: @first-last',
    'I.G: @first-last',
    'I.g: @first-last',
    'I.g.: @first-last',
    'i.G: @first-last',
    'i.G.: @n-last',
    'i.g: @first-last',
    'i.g. @first-last',
    'I.G.:@first-last',
    'I.G@first-last',
    'I.g @first-last',
    'I.gfirst-last',
    'i.G: first_last',
    'i.G. first_last',
    'ig: first_last',
    'i.g. @first-last',
    'inStagram: @first-last',
    'instAgram: @first-last',
    'INSTAGRAM: @first-last',
]

not_matched = 0
for test in tests:
    searched = username_regex.search(test)

    if searched:
        print("MATCH ->", test)
        print(searched.group(), '\n\n')
    else:
        print("========", test)
        not_matched += 1

print(not_matched)
# >> 0

If you want to get the prefix and username, you can use the group() and groups() method. For example

searched.groups()
# ('I.G:', '@first-last', None, None)

searched.group(0) # 'I.G: @first-last'

# If you want to get the prefix
searched.group(1) # 'I.G:'

# If you want to get the username
searched.group(2) # '@first-last'

NOTE: It is possible that I am wrong here somewhere, if you found something wrong, please let me know. Thanks.