Search code examples
perlqualifiers

perl -- sorting a comma-separated source that has "qualifiers"


I'm a weak perl user (and manipulator of arrays), and this problem is stumping me. Hope someone can help!

I have a source file with the following type of data (greatly simplified):

URL: 22489196
Keywords: Ball, Harga, Call, Dall, Eall, Jarga, Fall

URL: 22493265
Keywords: Hall, Iall, Yarga, Jall, Zarga, Kall

The words interrupting the alpha order (Harga, etc.) are "qualifiers". The end result I need is:

22489196

Ball--Harga
Call
Dall
Eall--Jarga
Fall

22493265

Hall
Iall--Yarga
Jall--Zarga
Kall

I've tried various "for" loops, pushing the terms into a second array and shifting the original array on conditional concatenation of its terms, but I still end up with missing or extra terms. Can anyone suggest how this might be done? MANY THANKS in advance!

ADDED: here's one iteration of part of my messy code:

while (<FILE>) {

    if (/URL\:/) {

        print "$_\n";
    }

    if (/Keywords\: /) {

        s/Keywords\: //;
        chomp();

        my @terms    = split ', ', $_;
        my @bakterms = reverse @terms;
        my $noTerms  = @terms;
        my $IzItOdd  = $noTerms%2;
        #my $ctr = $noTerms++;

        for ($i = 0; $i <= $#bakterms; $i++){

            my $j = $i+1;

            if ($j <= $#bakterms) {

                my $one = $bakterms[$i];
                my $two = $bakterms[$j];

                if ($two gt $one) { # i.e., if $two is alphabetically AFTER $one

                    push @ary3, $bakterms[$i];
                    $disarry = 1;
                    my $interloper = $bakterms[$j+1].= "--" . $two;
                    push @ary3, $interloper;
                    shift @bakterms;
                    #$ctr--;
                    shift(@bakterms);
                    #$ctr--;
                }
                else {

                    push @ary3, $bakterms[$i];
                    #shift(@bakterms);
                    shift @bakterms;
                    $disarry = 0;
                }
            }
        }
        @ary3 = sort @ary3;

        foreach my $term (@ary3) {

            print "** $term\n";
        }

        @ary3 = ();
        print"\n";
    }
}
exit 0;

Solution

  • Well, "Harga" doesn't interrupt alphabetical order, "Call" does. So the qualifier is actually the word before the one that interrupts alphabetical order.

    my $keywords = ...;  # 'Ball, Harga, Call, Dall, Eall, Jarga, Fall'
    my @keywords = split /\s*,\s*/, $keywords;
    my $prev_keyword = '';
    while (@keywords) {
        my $keyword = shift(@keywords);
    
        my $qualifier;
        if (@keywords >= 1 && $keyword eq $prev_keyword) {
           $qualifier = shift(@keywords);
        }
        elsif (@keywords >= 2 && $keywords[0] gt $keywords[1]) {
           $qualifier = shift(@keywords);
        }
    
        if (defined($qualifier)) {
           print("$keyword--$qualifier\n");
        } else {
           print("$keyword\n");
        }
    
        $prev_keyword = $keyword;
    }