Search code examples
phpregexvalidationurlfiltering

Filter an array of urls which must contain specific text and not contain other text


I want to extract specific links from a website.

The links look like that:

/topic/Funny/G1pdeJm

The links are always the same - except the last random chars.

I'm getting hard time to combine these parts

(preg_match("/^http:\/\//i",$str) || is_file($str))

and

(preg_match("/Funny(.*)/", $str) || is_file($str))

first code extract every links second extract from the links only the /topic/Funny/* part.

Unfortunately, I can't combine them, also I want to also block these tags:

/topic/Funny/viral
/topic/Funny/time
/topic/Funny/top
/topic/Funny/top/week
/topic/Funny/top/month
/topic/Funny/top/year
/topic/Funny/top/all

Solution

  • you could try using negative lookaheads to "filter out" the urls you don't like:

    .*\/Funny\/(?!viral|time|top\/week|top\/month|top\/year|top\/all|top(\n|$)).*
    

    demo here