How do I speed up pattern recognition in perl

This is the program as it stands right now, it takes in a .fasta file (a file containing genetic code), creates a hash table with the data and prints it, however, it is quite slow. It splits a string an compares it against all other letters in the file.

use strict;
use warnings;
use Data::Dumper;

my $total = $#ARGV + 1;
my $row;
my $compare;
my %hash;
my $unique = 0;
open( my $f1, '<:encoding(UTF-8)', $ARGV[0] ) or die "Could not open file '$ARGV[0]' $!\n";

my $discard = <$f1>;
while ( $row = <$f1> ) {
    chomp $row;
    $compare .= $row;
}
my $size = length($compare);
close $f1;
for ( my $i = 0; $i < $size - 6; $i++ ) {
    my $vs = ( substr( $compare, $i, 5 ) );
    for ( my $j = 0; $j < $size - 6; $j++ ) {
        foreach my $value ( substr( $compare, $j, 5 ) ) {
            if ( $value eq $vs ) {
                if ( exists $hash{$value} ) {
                    $hash{$value} += 1;
                } else {
                    $hash{$value} = 1;
                }
            }
        }
    }
}
foreach my $val ( values %hash ) {
    if ( $val == 1 ) {
        $unique++;
    }
}

my $OUTFILE;
open $OUTFILE, ">output.txt" or die "Error opening output.txt: $!\n";
print {$OUTFILE} "Number of unique keys: " . $unique . "\n";
print {$OUTFILE} Dumper( \%hash );
close $OUTFILE;

Thanks in advance for any help!

Solution

It is not clear from the description what is wanted from this script, but if you're looking for matching sets of 5 characters, you don't actually need to do any string matching: you can just run through the whole sequence and keep a tally of how many times each 5-letter sequence occurs.

use strict;
use warnings;
use Data::Dumper;

my $str; # store the sequence here
my %hash;
# slurp in the whole file
open(IN, '<:encoding(UTF-8)', $ARGV[0]) or die "Could not open file '$ARGV[0]' $!\n";
while (<IN>) {
    chomp;
    $str .= $_;
}
close(IN);

# not sure if you were deliberately omitting the last two letters of sequence
# this looks at all the sequence
my $l_size = length($str) - 4;
for (my $i = 0; $i < $l_size; $i++) {
    $hash{ substr($str, $i, 5) }++;
}

# grep in a scalar context will count the values.
my $unique = grep { $_ == 1 } values %hash;

open OUT, ">output.txt" or die "Error opening output.txt: $!\n";
print OUT "Number of unique keys: ". $unique."\n";
print OUT Dumper(\%hash);
close OUT;