Search code examples
perl

How can I sort a list of files by the multiple components in their names?


I have list of Arrays (INPUT) listed below: Required output in the second column.

Hint: Sequence order must be based on the Colur MODE (wc, gs, pc).

        INPUT                           OUTPUT
        -----                           ------
PDIS_01220157RE_pc_bio1.tif     PDIS_01220157RE_gs_f1.tif
PDIS_01220157RE_pc_bio2.tif     PDIS_01220157RE_wc_f1.tif
PDIS_01220157RE_pc_bio3.tif     PDIS_01220157RE_gs_f2.tif
PDIS_01220157RE_pc_bio4.tif     PDIS_01220157RE_wc_f2.tif
PDIS_01220157RE_wc_bio1.tif     PDIS_01220157RE_gs_f3.tif
PDIS_01220157RE_wc_bio2.tif     PDIS_01220157RE_wc_f3.tif
PDIS_01220157RE_wc_bio3.tif     PDIS_01220157RE_gs_f4.tif
PDIS_01220157RE_wc_bio4.tif     PDIS_01220157RE_wc_f4.tif
PDIS_01220157RE_gs_f1.tif       PDIS_01220157RE_gs_f5.tif
PDIS_01220157RE_wc_f1.tif       PDIS_01220157RE_wc_f5.tif
PDIS_01220157RE_gs_f2.tif       PDIS_01220157RE_gs_f6.tif
PDIS_01220157RE_wc_f2.tif       PDIS_01220157RE_gs_f7.tif
PDIS_01220157RE_gs_f3.tif       PDIS_01220157RE_gs_f8.tif
PDIS_01220157RE_wc_f3.tif       PDIS_01220157RE_gs_f9.tif
PDIS_01220157RE_gs_f4.tif       PDIS_01220157RE_wc_f9.tif
PDIS_01220157RE_wc_f4.tif       PDIS_01220157RE_pc_f10.tif
PDIS_01220157RE_gs_f5.tif       PDIS_01220157RE_wc_f10.tif
PDIS_01220157RE_wc_f5.tif       PDIS_01220157RE_wc_f11.tif
PDIS_01220157RE_gs_f6.tif       PDIS_01220157RE_wc_f12-1.tif
PDIS_01220157RE_gs_f7.tif       PDIS_01220157RE_pc_f12-1.tif
PDIS_01220157RE_gs_f8.tif       PDIS_01220157RE_wc_f12-2.tif
PDIS_01220157RE_gs_f9.tif       PDIS_01220157RE_pc_f12-2.tif
PDIS_01220157RE_wc_f9.tif       PDIS_01220157RE_wc_f12-11.tif
PDIS_01220157RE_pc_f10.tif      PDIS_01220157RE_gs_f12-11.tif
PDIS_01220157RE_wc_f10.tif      PDIS_01220157RE_wc_bio1.tif
PDIS_01220157RE_wc_f11.tif      PDIS_01220157RE_pc_bio1.tif
PDIS_01220157RE_gs_f12-11.tif       PDIS_01220157RE_wc_bio2.tif
PDIS_01220157RE_pc_f12-1.tif        PDIS_01220157RE_pc_bio2.tif
PDIS_01220157RE_pc_f12-2.tif        PDIS_01220157RE_wc_bio3.tif
PDIS_01220157RE_wc_f12-1.tif        PDIS_01220157RE_pc_bio3.tif
PDIS_01220157RE_wc_f12-11.tif       PDIS_01220157RE_wc_bio4.tif
PDIS_01220157RE_wc_f12-2.tif        PDIS_01220157RE_pc_bio4.tif

Code:

Bit of code:

#-------------------->
sub IMGSValidations
#-------------------->
{
    my @ChkAllImgs = @_; my @backImgs = @ChkAllImgs; my $figname = join "\n", @ChkAllImgs;
    my @sorted_array = ();
    $WC_TrFl = $GS_TrFl = $PC_TrFl = 'False';

    my $ImgFileJoinTemp = join "\n", @backImgs;

    if($ImgFileJoinTemp=~m/\_f(\d+)\-/)
    {
        @sorted_array = sort {
          my ($anum) = ($a =~ /\-(\d+)\./);
          my ($bnum) = ($b =~ /\-(\d+)\./);
          $anum <=> $bnum
        } @backImgs;
    }
    if($ImgFileJoinTemp=~m/\_f(\d+)/)
    {
        @sorted_array = sort {
          my ($anum) = ($a =~ /\_f(\d+)/);
          my ($bnum) = ($b =~ /\_f(\d+)/);
          $anum <=> $bnum
        } @backImgs;
    }

    $figname = join "\n", @sorted_array;
    my $FigListText = "$curDir/Test.txt"; writeFileinString($FigListText,\$figname);
}

I am not getting the expected output. Could someone please advice to get the output.


Solution

  • Defining the sort order is the first thing you need to do. If you can't describe the sort order to a person, you can't possibly describe it to Perl.

    To do that, we'll need some nomenclature. The names of the www have these parts:

    • The xxx (PDIS)
    • The yyy (01220157RE)
    • The mode (wc)
    • The zzz (f12-1)
      • The zzz prefix (f)
    • The extension (.tif)

    (I had to make up some words. The OP really failed at providing necessary details!)

    The xxx and the yyy could be considered a single part, but I'm going with two because it allows me to use split /_/.

    The zzz could be subdivided further, but it turns out that it isn't needed.


    Now, the order. I'm guessing the following is the order we want:

    1. By xxx, using a case-sensitive string compare.
    2. By yyy, using a case-sensitive string compare.
    3. By zzz prefix, f → others. (???)
    4. By zzz, using a "natural" string compare.
    5. By mode, gswcpc. (???)
    6. By the file name, using a case-sensitive string compare. (Probably not needed. Handles multiple file names with different extensions for the same www. I like having sorts that provide only one possible orderings, if only to ease testing.)

    Each line after the first handles ties from the preceding line.

    Note that this places PDIS_01220157RE_wc_f10.tif before PDIS_01220157RE_pc_f10.tif, in conflict with the stated desired output.

    Note that this places PDIS_01220157RE_gs_f12-11.tif before PDIS_01220157RE_wc_f12-11.tif, in conflict with the stated desired output.

    (Because of these conflicts, I might guess that the mode shouldn't be part of the sort order, except it's the one thing the OP indicated was part of the sort order. This suggests that what little information the OP did provide, the desired output, is not correct!)


    Finally, now that we have our sort order defined, it's easy to implement.

    use experimental qw( regex_sets );
    
    use Sort::Key::Natural qw( );
    
    use Sort::Key::Maker www_sort => \&www_key, qw( str str int nat int str );
    
    my @ordered_modes = qw( gs wc pc );
    
    my %mode_ranks = map { $ordered_modes[ $_ ] => $_ } 0 .. $#ordered_modes;
    
    sub www_key {
       my ( $name, $ext ) = /^(.*)((?:\.[^.*])?)\z/s;
       my ( $xxx, $yyy, $mode, $zzz ) = split /_/, $name;
       my ( $zzz_pre ) = $zzz =~ /^((?[ \w - \d ])*)/;
    
       my $zzz_rank = $zzz_pre eq "f" ? 0 : 1;
       my $mode_rank = $mode_ranks{ $mode };
    
       return $xxx, $yyy, $zzz_rank, $zzz, $mode_rank, $_;
    }
    

    Usage:

    my @unsorted = www_sort @unsorted;
    

    use experimental qw( regex_sets ); is safe since the experimental feature was accepted without change in 5.36, at which point use experimental qw( regex_sets ); is no longer needed though harmless. You could also use /^([^\W\d]*)/ instead of /^((?[ \w - \d ])*)/.