I want to find out an Instagram username from a profile page.
The thing is users choose how to address their usernames. (So, it is tricky to make computer get the pattern with RegEx)
All of patterns I want to search are shown below (user posts their Instagram username using one of them):
I thought of this logic below but I got completely lost searching in RegEx documentation or examples suitable for this search.
My logic: ignorecase (IG or I.G. or I.G or instagram) + (possible space) + (possible :) + (possible space) + (possible @) + (username with - or _ in it) + (ends with space or new line or full stop)
In a word, I'd like to select a word(username) after "instagram" or "IG" or "I.G" excluding unnecessary characters like ":", "@" or spaces.
How can I do this in RegEx? One-liner might be an efficient, yet elegant answer.
P.S. I want to do this with Python re.
My logic: ignorecase(IG or I.G. or I.G or instagram) + (possible space) + (possible :) + (possible space) + (possible @) + (username with - or _ in it) + (ends with space or new line or full stop)
First, on prefix part (IG and Instagram:). You can use re.I
or re.IGNORECASE
argument on re.compile
function to ignore cases, on I.G and instagram. Then use the |
or the or
on regex terms.
r'(instagram|I\.*G\.*)'
Then escape the .
and use the question mark ?
which indicates that it can either have one or none, also on possible space \s
and possible colon :
.
prefix = re.compile(r'(instagram|I\.*G\.*)\s?:?', re.IGNORECASE)
And then for the username. First, use the question mark ?
on @
to indicate that it is optional. Then the two (.*)
are the first and last (if any) part of the username separated by either dash or underscore (-|_)?
which is also optional.
username = re.compile(r'@?(.)(-|_)?(.)\s?$')
Placing it altogether:
username_regex = re.compile(r'^(instagram|I\.?G\.?)\s?:?\s?(@?.*((-|_).*)?\s?)$', re.IGNORECASE)
I've performed some tests for this regex, here is the code.
import re
username_regex = re.compile(r'^(instagram|I\.?G\.?)\s?:?\s?(@?.*((-|_).*)?\s?)$', re.IGNORECASE)
tests = [
'I.G.: @first-last',
'I.G: @first-last',
'I.g: @first-last',
'I.g.: @first-last',
'i.G: @first-last',
'i.G.: @n-last',
'i.g: @first-last',
'i.g. @first-last',
'I.G.:@first-last',
'I.G@first-last',
'I.g @first-last',
'I.gfirst-last',
'i.G: first_last',
'i.G. first_last',
'ig: first_last',
'i.g. @first-last',
'inStagram: @first-last',
'instAgram: @first-last',
'INSTAGRAM: @first-last',
]
not_matched = 0
for test in tests:
searched = username_regex.search(test)
if searched:
print("MATCH ->", test)
print(searched.group(), '\n\n')
else:
print("========", test)
not_matched += 1
print(not_matched)
# >> 0
If you want to get the prefix and username, you can use the group()
and groups()
method. For example
searched.groups()
# ('I.G:', '@first-last', None, None)
searched.group(0) # 'I.G: @first-last'
# If you want to get the prefix
searched.group(1) # 'I.G:'
# If you want to get the username
searched.group(2) # '@first-last'
NOTE: It is possible that I am wrong here somewhere, if you found something wrong, please let me know. Thanks.