I am attempting to get a percentage of an edit distance from a group of sequences. So far this is what I have:
#!/usr/bin/perl -w
use strict;
use Text::Levenshtein qw(distance);
my @sequence = qw(CA--------W----------------------EKDRRTEAF---F------
CA--------W----------------------EKDRRTEAF---F------
CA--------S-------------------SLVFGQGDNIQY---F------
RA--------S-------------------SLIYSP----LH---F------);
foreach my $list (@sequence){
my @distance = distance($list, @sequence);
my @length = $list =~ tr/[A-Z]///;
}
I am able to get the edit distance with @distance
and the length of each sequence, based on the letters with @length
. If printed the results are as follows:
@distance
0 0 13 14
0 0 13 14
13 13 0 11
14 14 11 0
@length
13
13
16
12
As each line of @length
is equivalent to each line of @sequence
, when comparing the two lines I would like to use the largest @length
to get the percentage. So as when having an edit distance between the second and third sequence it would use the length of 16 rather than 13 to get a percentage. What I think needs to happen is to call only two elements of the @length
array and pick the larger one to then put into a percentage, possibly using an if
statement.
I know this code is wrong, but it is generally the idea I am going for:
foreach my $list (@sequence){
my @distance = distance($list, @sequence);
my @length = $list =~ tr/[A-Z]//; # / syntax hilite fix
foreach my $item(@distance){
foreach @length {
my $num1 = if $length[0] >= $length[1];
print "$item/$num1\n";
else my $num2 = $length[1] >= $length[0];
print "$item/$num2\n";
}
}
}
The answer should look something similar to that below:
0 0 .8125 1.0769 0 0 .8125 1.0769 .8125 .8125 0 .6875 1.0769 1.0769 .6875 0
Try this. To summarize: We compute the edit distances between pairs of
strings. For each pair we want to determine the fraction of the distance and the maximum number of characters (A-Z
). The maximum number of characters is taken to be the maximum for the two items in the pair.
use strict;
use warnings;
use Text::Levenshtein qw(distance);
my @sequence = qw(
CA--------W----------------------EKDRRTEAF---F------
CA--------W----------------------EKDRRTEAF---F------
CA--------S-------------------SLVFGQGDNIQY---F------
RA--------S-------------------SLIYSP----LH---F------
);
my @length = map { tr/[A-Z]// } @sequence;
for my $i (0..$#sequence) {
my $list = $sequence[$i];
my @distance = distance($list, @sequence);
my $num1 = $length[$i];
for my $j (0..$#distance) {
my $item = $distance[$j];
my $num2 = $length[$j];
my $num = ( $num2 > $num1 ) ? $num2 : $num1;
printf "%.4f ", $item/$num;
}
print "\n";
}
Output:
0.0000 0.0000 0.8125 1.0769
0.0000 0.0000 0.8125 1.0769
0.8125 0.8125 0.0000 0.6875
1.0769 1.0769 0.6875 0.0000