I'm trying to strip a bunch of websites down to their domain names i.e:
https://www.facebook.org/hello
becomes facebook.org
.
I'm using the regex pattern finder:
(https?:\/\/)?([wW]{3}\.)?([\w]*.\w*)([\/\w]*)
This catches most cases but occasionally there will be websites such as:
http://www.xxxx.wordpress.com/hello
which I want to strip to xxxx.wordpress.com
.
How can I identify those cases while still identifying all other normal entries?
You expression seems to be working perfectly fine and it outputs what you might want to. I only added an i
flag and slightly modify it to:
(https?:\/\/)?([w]{3}\.)?(\w*.\w*)([\/\w]*)
If this wasn't your desired expression, you can modify/change your expressions in regex101.com.
You can also visualize your expressions in jex.im:
# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility
import re
regex = r"(https?:\/\/)?([w]{3}\.)?(\w*.\w*)([\/\w]*)"
test_str = ("https://www.facebook.org/hello\n"
"http://www.xxxx.wordpress.com/hello\n"
"http://www.xxxx.yyy.zzz.wordpress.com/hello")
subst = "\\3"
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE | re.IGNORECASE)
if result:
print (result)
# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.
const regex = /(https?:\/\/)?([w]{3}\.)?(\w*.\w*)([\/\w]*)/gmi;
const str = `https://www.facebook.org/hello
http://www.xxxx.wordpress.com/hello
http://www.xxxx.yyy.zzz.wordpress.com/hello`;
const subst = `$3`;
// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);
console.log('Substitution result: ', result);