Search code examples
phparraysregexfilteringpreg-grep

Filter array of urls containing a whitelisted substring and not containing any blacklisted substrings


I have an array shown below

[
    'http://api.tweetmeme.com/imagebutton.gif?url=http://mashable.com/2010/09/25/trailmeme/', 
    'http://cdn.mashable.com/wp-content/plugins/wp-digg-this/i/gbuzz-feed.png',
    'http://mashable.com/wp-content/plugins/wp-digg-this/i/fb.jpg',
    'http://mashable.com/wp-content/plugins/wp-digg-this/i/diggme.png',
    'http://ec.mashable.com/wp-content/uploads/2009/01/bizspark2.gif',
    'http://cdn.mashable.com/wp-content/uploads/2010/09/web.png',
    'http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png', 
    'http://cdn.mashable.com/wp-content/uploads/2009/02/bizspark.jpg',
    'http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/0/di',
    '',
    'http://feedads.g.doubleclick.net/~at/lxx00QTjYBaYojpnpnTa6MXUmh4/1/di',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:D7DqB2pKExk',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:V_sGLiPBpWU',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:F7zBnMyn0Lo',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?d=qj6IDK7rITs',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?d=_e0tkf89iUM',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:gIN9vFwOqvQ',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?d=yIl2AUoC8zA',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?d=P0ZAIrC63Ok',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?d=I9og5sOYxJI',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?d=CC-BsrAYo0A',
    '',
    'http://feeds.feedburner.com/~ff/Mashable?i=0N_mvMwPHYk:j5Pmi_N-JQ8:_cyp7NeR2Rw',
    '',
    'http://feeds.feedburner.com/~r/Mashable/~4/0N_mvMwPHYk',
]

I want to:

  1. remove every empty array element,
  2. remove every array item without extensions .jpg, .png, or .gif in its name,
  3. and remove array items containing keywords such as digg, fb, tweet, bizspark.

I would like it to retain:

5 => http://cdn.mashable.com/wp-content/uploads/2010/09/web.png 
6 => http://mashable.com/wp-content/uploads/2010/09/Screen-shot-2010-09-24-at-10.51.26-PM.png 

Any ideas?


Solution

  • Using, e.g., array_filter() will give you flexibility and ease of maintenance (changing requirements, de-bugging, etc.):

    function url_array_filter($url)
    {
        static $words = array('digg', 'fb', 'tweet', 'bizspark');
        static $extens = array('.jpg', '.png', '.gif');
        $ret = true;
        if (!$url) {
            $ret = false;
        } elseif (str_replace($words, '', $url) != $url) {
            $ret = false;
        } else {
            $path = parse_url($url, PHP_URL_PATH);
            if (in_array(substr($path, -4), $extens)) {
                $ret = false;
            }
        }
        return $ret;
    }
    
    $arr = array_filter($arr, 'url_array_filter');
    print_r($arr);
    

    (Works for the array given, but may need changes; it's demo code.)