Search code examples
arraysperlhashuniquecombinations

How to extract data from an array that contains only one occurrence of an element and only one occurrence of a corresponding element in Perl


I'm trying to figure out how to extract records from a file that contains only one occurrence of a trainer and only one occurrence of a jockey.

Essentially, the record would imply that the jockey has only one ride for the day and it is for trainer X who has only one runner for the day.

Here are some "sample data":

ALLAN DENHAM,MUSWELLBROOK,RACE 5,MOPITTS (10),JEFF PENZA,B,5
ALLAN KEHOE,MUSWELLBROOK,RACE 3,FOXY FIVE (5),KOBY JENNINGS,C,3
ALLAN KEHOE,MUSWELLBROOK,RACE 4,BANGALLEY LAD (3),KOBY JENNINGS,BBB,4
ANDREW ROBINSON,MUSWELLBROOK,RACE 6,TROPHIES GALORE (4),DARRYL MCLELLAN,AAA,6
BEN HILL,MUSWELLBROOK,RACE 4,WHALER BILL (10),GRANT BUCKLEY,BB,4
BEN HILL,MUSWELLBROOK,RACE 5,MR BILL (5),GRANT BUCKLEY,BB,4
BJORN BAKER,MUSWELLBROOK,RACE 3,MISS JAY FOX (9),ALYSHA COLLETT,BB,3
BRETT CAVANOUGH,MUSWELLBROOK,RACE 3,OFFICE AFFAIR (10),RACHAEL MURRAY,B,3
BRETT THOMPSON,MUSWELLBROOK,RACE 7,COSTAS (2),RONALD SIMPSON,BB,7
CODY MORGAN,MUSWELLBROOK,RACE 6,BAJAN GOLD (5),JEFF PENZA,BB,6
CODY MORGAN,MUSWELLBROOK,RACE 7,RAPID EAGLE (9),DARRYL MCLELLAN,B,7

In the sample data, the first record that would meet my criteria would be the following:

BJORN BAKER,MUSWELLBROOK,RACE 3,MISS JAY FOX (9),ALYSHA COLLETT,BB,3

Note: BJORN BAKER only appears once and ALYSHA COLLETT only appears once.

In the sample data, trainer ALLAN DENHAM has only one runner for the day but jockey JEFF PENZA has 2 rides, one for trainer ALLAN DENHAM & one for trainer CODY MORGAN so this does not my meet my criteria.

Another record that would meet my criteria would be the following record:

BRETT CAVANOUGH,MUSWELLBROOK,RACE 3,OFFICE AFFAIR (10),RACHAEL MURRAY,B,3

Note: BRETT CAVANOUGH only appears once and RACHAEL MURRAY only appears once.

BRETT THOMPSON,MUSWELLBROOK,RACE 7,COSTAS (2),RONALD SIMPSON,BB,7

Note: BRETT THOMPSON only appears once and RONALD SIMPSON only appears once.

And so on...

I've loaded the "sample data" (top of page) into an array in Perl and have investigated how to use hash, etc. in order to extract the unique records but I cannot figure out how to extract the required records based on the uniqueness of the combination of the two elements (i.e. one trainer + the one corresponding jockey)

use Data::Dumper;

$infile = "TRAINER-JOCKEY-SAMPLE.txt";

open my $infile, "<:encoding(utf8)", $infile or die "$infile: $!";
my @recs = <$infile>;
close $infile;

my %uniques;
for my $rec (@recs) 
{
    my ($trainer, $racecourse, $racenum, $hnameandnum, $jockey, $TDRating, $rnum) = split(",", $rec);
    ++$uniques{$trainer}{$jockey};
}

print Dumper(\%uniques);

for my $trainer (sort keys %uniques) 
{
    my $answer = join ',', sort keys %{ $uniques{$trainer} };
    print "$trainer has unique values $answer\n";
}

Note: I need to print the entire record when successful (see below):

BJORN BAKER,MUSWELLBROOK,RACE 3,MISS JAY FOX (9),ALYSHA COLLETT,BB,3

Your help would be greatly appreciated.


Solution

  • Both the trainer and the jockey have to appear just once in the list (unless the input has duplicate lines).

    So, let's just count the occurrences of trainers. To be able to match them to jockeys, we'll store jockeys to trainers in a hash of hashes.

    Once we build the two structures, just select the jockeys with only one associated trainer and check that the trainer appeared just once, which had to be with the jockey they were associated to.

    #!/usr/bin/perl
    use warnings;
    use strict;
    use feature qw{ say };
    
    my (%jockeys, %trainers);
    while (<>) {
        my ($jockey, $trainer) = (split /,/)[0, 4];
        ++$trainers{$trainer};
        undef $jockeys{$jockey}{$trainer};
    }
    
    for my $jockey (keys %jockeys) {
        next if 1 < keys %{ $jockeys{$jockey} };
    
        my $trainer = (keys %{ $jockeys{$jockey} })[0];
        say "$jockey,$trainer" if 1 == $trainers{$trainer};
    }
    

    Update: To print the whole lines, we need to store them somewhere, too. We can slightly modify the program by remembering the whole lines in another hash; we can use either the trainer or the jockey as the key.

    #!/usr/bin/perl
    use warnings;
    use strict;
    
    my (%jockeys, %trainers, %full);
    while (<>) {
        my ($jockey, $trainer) = (split /,/)[0, 4];
        ++$trainers{$trainer};
        undef $jockeys{$jockey}{$trainer};
        $full{$jockey} = $_;
    }
    
    for my $jockey (keys %jockeys) {
        next if 1 < keys %{ $jockeys{$jockey} };
    
        my $trainer = (keys %{ $jockeys{$jockey} })[0];
    
        print $full{$jockey} if 1 == $trainers{$trainer};
    }