Search code examples
regexpreg-matchregular-language

RegExp getting link from String except multiple www


when i try to get links from String such as

"hello world https://www.sample.com/voices/2020/my-sound-www.sample.com"

i get multiple link from here because i have multiple www, how can i except that?

output:

  1. https://www.sample.com/voices/2020/my-sound-www.sample.com
  2. www.sample.com

this output is incorrect and that should be one link not two link

https://www.sample.com/voices/2020/my-sound-www.sample.com

My regex pattern:

r"((https?:www\.)|(https?:\/\/)|(www\.))[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9]{1,6}(\/[-a-zA-Z0-9()@:%_\+.~#?&\/=]*)?"

Solution

  • You can use

    final reg = RegExp(r'(?:https?:(?:\\?\/\\?\/|www\.)|www\.)[^\s<>"'']*\.mp3');
    final m = reg.firstMatch(test);
    print(m.group(0));
    // => https://www.caferilik.com/wp-content/uploads/2020/11/Anne-Baba-Biz-Suçluyuz-Muhafazakar-Ailelerde-Kuşak-Çatışması-Sesli-Kitap-www.caferilik.com_.mp3
    

    Here, the pattern is

    (?:https?:(?:\\?\/\\?\/|www\.)|www\.)[^\s<>"']*\.mp3
    

    See the regex demo

    Details:

    • (?:https?:(?:\\?\/\\?\/|www\.)|www\.) - http, followed with an optional s char, then : and then either // with an optional \ before each / or www., just or www.
    • [^\s<>"']* - zero or more chars other than whitespace, <, >, " and '
    • \.mp3 - an .mp3 string.