Search code examples
perlsortinghierarchicalcmp

Subkey comparison function for sorting


I need a Perl comparison function that can be used with sort.

Each key is a text string that has an arbitrary number of subkeys, separated by delimiter characters (dot, colon, space, and slash). Some subkeys are numeric, and need to be sorted numerically. The key format and number of subkeys varies. Therefore, the comparison has to handle one key being longer than the other, and has to handle the case where a subkey is numeric in one key but not in another (in which case a textual comparison is appropriate for that subkey).

This works, but I bet there are better solutions:

use warnings;
use strict;
use Scalar::Util qw[looks_like_number];

sub hier_cmp {

    my $aa = $a;
    my $bb = $b;

    # convert all delims (. : / space) to the same delim

    $aa =~ tr/.:\/ /::::/;
    $bb =~ tr/.:\/ /::::/;
    my @lista = split(":", $aa);
    my @listb = split(":", $bb);

    my $result;

    for my $ix (0 .. min($#lista, $#listb)) {
        if (exists($lista[$ix]) && exists($listb[$ix])) {
            if ( looks_like_number($lista[$ix]) && looks_like_number($listb[$ix])) {
                # compare numerically
                $result = ($lista[$ix] <=> $listb[$ix]);
            } else {
                # compare as strings
                $result = ($lista[$ix] cmp $listb[$ix]);
            }
            if ($result == 0) {
                next;
            }
            return $result;

        } elsif (exists($lista[$ix])) {
            return 1;
        } else {
            return -1;
        }
    }
}

For my purposes, readability is more important than speed. This is just for an internal tool, and lists will rarely have more than hundreds of elements. However, any opportunity to learn something is good.

As you can see, I'm not a perl wizard. Even trivial improvements on my code would be appreciated.

Thanks!


Solution

  • It would help if you gave us some data to test with, but this code passes a few basic tests and it looks right.

    It simplifies the problem by using the List::MoreUtils function pairwise to create an array of field pairs.

    Then it is just a matter of checking whether only one is defined, when one of the lists has come to an end before the other and should be sorted first; if they are both numeric, when they should be compared with a numeric comparison; or otherwise simply compare them as strings.

    If the end of the array of pairs is reached then everything has matched and zero is returned to indicate equiality.

    Update

    I have changed this code to remove the dependency on List::MoreUtils::pairwise.

    use strict;
    use warnings;
    
    use Scalar::Util 'looks_like_number';
    
    sub hier_cmp {
    
      our ($a, $b);
    
      my @a = split m|[.: /]+|, $a;
      my @b = split m|[.: /]+|, $b;
    
      for my $i (0 .. $#a > $#b ? $#a : $#b) {
        my @ab = ( $a[$i], $b[$i] );
        if (grep defined, @ab < 2) {
          return defined $ab[0] ? 1 : -1;
        }
        else {
          my $numeric = grep(looks_like_number($_), @ab) == 2;
          my $result = $numeric ? $ab[0] <=> $ab[1] : $ab[0] cmp $ab[1];
          return $result if $result;
        }
      }
    
      return 0;
    }