I have a column with list redirect URL on Google Custom Search Results. I would like to extract the external domain from that combined URL.
Example:
https://www.google.com/url?client=internal-element-cse&cx=3c360356&q=https://examplesite1.co.uk/aa-vv--cc-dd-gggg-/&sa=U&ved=2ahUKEwjj1cvJ79PuAhXBHc0KHRgvBLsgQIAhAC&usg=AOvVaw2vIHUiy31YKWs5c41Q
https://www.google.com/url?client=internal-element-cse&cx=3c360356&q=http://www.exmaplesite2.co.uk/wp-content/uploads/2016/12/research-paper.pdf&sa=U&ved=2ahUKEwiphLKMi80KHcLUCMAQFjAFegQIARAC&usg=AOvVawkm-bXjmxsPxLQ9w3
https://www.google.com/url?client=internal-element-cse&cx=3c360356&q=https://examplesite-3.com/home/en/aaa-bbb/38376&sa=U&ved=2ahUKEwixq4K7qttXEKHTOEClsQFjAAegQIARAB&usg=AOvVaw2ouHhfNNTPV
From Above URL's, I would like to extract the external domain name
Results from above examples:
www.site2.co.uk
www.exmaplesite2.co.uk
examplesite-3.com
I am able to do this in Google Sheet, but need RedEx so that I can use it in Google Data Studio.
Thanks.
You may use this regex with an additional negative lookbehind:
(?<=(?<!^https)://)[^/]+
RegEx Details:
(?<=(?<!^https)://)
: Positive lookbehind to assert that we have ://
before current position. Additionally nested negative lookbehind (?<!^https)
asserts that we don't have starting https
before ://
thus skipping matching starting URLs[^/]+: Match 1+ of any character that is not
/`Update: As per comments below lookbehind is not supported in Google Data Studio
, hence we can use this regex:
.https?://([^/]+)
And grab domain name from capture group #1.
.
placed before https?:
will ensure that we don't match a URL at the start of a line.