Search code examples
perlsortingperl-data-structures

How to get a list of all html files in all subdirectories alphabetically sorted in perl?


Currently I am doing this:

# Find all files
File::Find::find(
    sub {
        my $file = $_;
        return if -d $file; 
        return if $file !~ /(.*)\.htm$/;

        ...my processing code

    }, $inputdir
);

But I want to process all the files alphabetically, ultimately I'd like to store all the file names in an array, sort the array, then use a for each loop and put my processing code in there, but I am completely stuck how to do it.

I've done lots of googling but like everything else in perl, there are 100s of ways to do everything, and none of them seem to let me do all the things I want to, that is all files ending in .html, all subdirectories within a specific directory, and alphabetically sorted based on their file name, not their directory structure.

Can anyone help me out? I know this can be done fairly easily, I just cannot figure it out.

Thanks :)

edit: I've tried doing this:

File::Find::find(
    sub {
        #Only process html files
        my $file = $_;
        return if -d $file; 
        return if $file !~ /(.*)\.htm$/;

        push(@files, $File::Find::name);

    }, $inputdir 
);

But then if I sort the array @files it will sort it based on the entire string, I just want to sort it based on the filename. I don't think there is a way to do it with File::find:find as there no way it can know what the order is until it has traversed all the files, so I need to do the sort afterwards.


Solution

  • you can use File::Basename - Parse file paths into directory, filename and suffix and Schwartzian transform to sort files on the basis of filename like,

     @files = map { $_->[0] }
        sort { $a->[1] cmp $b->[1] }
        map { [$_, fileparse($_, "\.html?")] } @files; 
    

    The fileparse() routine of File::Basename divides a file path into its $directories, $filename and (optionally) the filename $suffix. so get the filename and pass it into Schwartzian transform for sorting.