Search code examples
arraysstringperlcontain

is any of array items contained in a string


I have a keywords list and a blacklist. I want to delete all keywords that contain any of blacklist item. At the moment Im doing it this way:

my @keywords = ( 'some good keyword', 'some other good keyword', 'some bad keyword');
my @blacklist = ( 'bad' );

A: for my $keyword ( @keywords ) {
    B: for my $bl ( @blacklist ) {
        next A if $keyword =~ /$bl/i;      # omitting $keyword
    }
    # some keyword cleaning (for instance: erasing non a-zA-Z0-9 characters, etc)
}

I was wondering is there any fastest way to do this, becouse at the moment I have about 25 milion keywords and couple of hundrets words in blacklist.


Solution

  • The most straightforward option is to join the blacklist entries into a single regular expression, then grep the keyword list for those which don't match that regex:

    #!/usr/bin/env perl    
    
    use strict;
    use warnings;
    use 5.010;
    
    my @keywords = 
      ('some good keyword', 'some other good keyword', 'some bad keyword');
    my @blacklist = ('bad');
    
    my $re = join '|', @blacklist;
    my @good = grep { $_ !~ /$re/ } @keywords;
    
    say join "\n", @good;
    

    Output:

    some good keyword
    some other good keyword