Search code examples
perlsortingnumbersnatural-sort

How can I sort a list of strings by numbers in them?


I have a list of filenames which are like so:

fw_d.log.1.gz  
through  
fw_d.log.300.gz  

When I use this code block below, it almost sorts it the way I want, but not quite:

#!/usr/bin/perl -w
my $basedir = "/var/log";
my @verdir = qw(fw_d);
my $fulldir;
my $configs;
my $combidir;

foreach $combidir (@verdir) {
    $fulldir = "$basedir/$combidir";
    opendir (DIR, $fulldir);
    my @files = grep { $_ ne '.' && $_ ne '..' && $_ ne 'CVS' readdir DIR;
    closedir (DIR);
    @files1 = sort {$a cmp $b}(@files);
    foreach my $configs (@files1) {
        print "Checking $configs\n";
        system("less $basedir/$combidir/$configs | grep \'.* Group = , Username = .* autheauthenticated.\' >> output.log" );
    }
}

Here is a snippet output:

Checking fw_d.log  
Checking fw_d.log.1.gz  
Checking fw_d.log.10.gz  
Checking fw_d.log.100.gz  
Checking fw_d.log.101.gz  
Checking fw_d.log.102.gz  

As you can see, it almost sorts it how I was hoping... Does anyone have any suggestions, on either reading, or a code snippet I can use?


Solution

  • You could use Schartzian-transform :

    my @sorted = map  { $_->[0] }
                 sort { $a->[1] <=> $b->[1] }
                 map  { [$_, $_=~/(\d+)/] }
                     @files;
    print Dumper \@sorted;
    

    Added benchmark for comparison between Schwartzian-Transform and subroutine

    use Benchmark qw(:all);
    
    # build list of files
    my @files = map {'fw_d.log.'.int(rand()*1000).'.log' } 0 ..300;
    
    my $count = -3;
    my $r = cmpthese($count, {
            'subname' => sub {
                  sub expand {
                       my $file=shift; 
                       $file=~s{(\d+)}{sprintf "%04d", $1}eg;
                       return $file;
                  }
                  my @sorted = sort { expand($a) cmp expand($b) } @files;
            },
            'schwartzian' => sub {
                  my @sorted = map  { $_->[0] }
                               sort { $a->[1] <=> $b->[1] }
                               map  { [$_, $_=~/(\d+)/] }
                     @files;
             }
    });
    

    Result:

                  Rate     subname schwartzian
    subname     21.2/s          --        -92%
    schwartzian  279/s       1215%          --
    

    Schwartzian-transform is about 13 times more efficient for sorting 300 files.