Search code examples
regexperlarchive-tar

Using Perl, push into an array a list of files in a directory except for a few select files based on file name


I'm uploading a tarball through a webpage, dropping it in to /tmp/ then asking this script (which will be invoked via crontab) to:

1.) extract it

2.) build a list of all of the files (only files and recursively) in the directory

3.) search each file for a string and print that filename and line with matched string to a file.

Everything is working up to the part where I want to build a list of files in the (extracted tarball) directory. If I don't put a "!" in front of the regex on line 6 in my code (matching only files that are .bak, .conf, .cfg), then I only get a dozen files in @filelist (as I'd expect, printed by the code on line 13).

However, if I put a "!" in front of my regex on line 6 (intended to match everything but those files), line 13 will print all filenames, including files with .bak, .conf, and .cfg extensions.

How can I get a collection of filenames in the (extracted tarball) directory except for those that I'm just not concerned about?

This is my code, roughly (stripped down, untested.) I'm a perl newb so I apologize for the ugliness of what I have here but it is what it is.

 1    sub loadFiles {
 2        my $dir=shift;
 3        find(\&recurDir,"$dir");
 4    }
 5    sub recurDir {
 6        if ( $File::Find::name =~ /(\.bak|\.conf|\.cfg)$/i ) {
 7            push @filelist, $File::Find::name;
 8        }
 9        print "$File::Find::name\n";
10    }
11    sub searcher {
12        my $file=$_;
13        print "Searching $file\n";
14    }
15    my $tarball = '/tmp/mytarball.tar.gz';
16    my $ae = Archive::Extract->new( archive=>$tarball ) || die ("$!");
17    $ae->extract( to=>$UPLOAD_DIR ) || die ("$ae->error");
18    my $dir_loc = File::Spec->catfile( $UPLOAD_DIR, $ae->files->[0]);
19    loadFiles("$dir_loc");
20    find(\&searcher, @filelist);

Solution

  • You're adding a directory to @filelist at line 7, then you print all the files in that directory and its subdirectories at line 13.

    Line 6 should be:

    if ( -f $File::Find::name && $File::Find::name !~ /\.(?:bak|conf|cfg)\z/i ) {
    

    Line 13 should be:

    searcher($_) for @filelist;
    

    searcher should be:

    sub searcher {
       my ($file) = @_;
       print "Searching $file\n";
    }
    

    Avoiding global vars, the whole looks like:

    sub loadFiles {
        my $dir=shift;
    
        my @filelist;
        my $wanted = sub {
            return if $File::Find::name =~ /\.(?:bak|conf|cfg)\z/i;
            return if !-f $File::Find::name;
            push @filelist, $File::Find::name;
        };
    
        find($wanted, $dir);
        return @filelist;
    }
    
    sub searcher {
        my $file=shift;
        print "Searching $file\n";
    }
    
    searcher($_) for loadFiles($dir_loc);
    

    (Technically, you could do searcher($File::Find::name); directly instead of pushing it to an array then later looping over the array.)