I am having trouble trying to count the amount of each element provided in a pdb file. So far, the code has achieved to take the lines from the file that starts with ATOM. I have also gotten to the point where I can print each element. But now I want to count each element such as C,O,N, and so on and list them in alphabetical order. The data looks like this:
ATOM 4851 O PRO A 715 89.164 76.083 75.292 1.00 99.41 O
ATOM 4852 CB PRO A 715 88.324 78.267 73.865 1.00 95.88 C
ATOM 4853 CG PRO A 715 88.836 78.838 72.540 1.00 95.93 C
ATOM 4854 CD PRO A 715 90.288 78.546 72.593 1.00 94.99 C
ATOM 4855 N PHE A 716 90.009 77.320 76.994 1.00100.00 N
ATOM 4856 CA PHE A 716 90.100 76.203 77.966 1.00 99.66 C
ATOM 4857 C PHE A 716 89.942 76.667 79.419 1.00 99.17 C
ATOM 4858 O PHE A 716 88.895 76.334 80.027 1.00100.00 O
ATOM 4859 CB PHE A 716 91.409 75.402 77.819 1.00 97.04 C
ATOM 4860 CG PHE A 716 91.290 74.219 76.899 1.00 96.00 C
ATOM 4861 CD1 PHE A 716 90.323 73.236 77.133 1.00 94.17 C
ATOM 4862 CD2 PHE A 716 92.112 74.107 75.774 1.00 94.87 C
ATOM 4863 CE1 PHE A 716 90.170 72.157 76.260 1.00 93.04 C
ATOM 4864 CE2 PHE A 716 91.971 73.037 74.894 1.00 93.41 C
ATOM 4865 CZ PHE A 716 90.996 72.057 75.137 1.00 95.29 C
My code is :
open (FILE, $ARGV[0])
or die "Could not open file\n";
my @newlines;
while ( my $line = <FILE> ) {
if ($line =~ m/^ATOM.*/) {
push @newlines, $line;
}
}
##############################################################
#This function will take out the element from each line
#The element is from column 77 and contains one or two letters
#The function will also sort each element alphabetically and count them
sub atomfreq {
foreach my $record1(@newlines) {
my @element = substr($record1, 76, 2);
my @sortelement = sort(@element);
print "@sortelement\n";
}
}
Thanks.
For counting the occurences of the elements C
, O
and N
(and others) a hash is the proper data structure.
Use the elements as keys and increment the corresponding value. After the loop you can then print the hash sorted by its keys:
sub atomfreq {
my %count;
foreach my $record1(@newlines) {
my $element = substr($record1, 76, 2);
$count{$element} += 1;
}
foreach my $element ( sort keys %count ) {
print "element=$element, count=$count{$element}\n";
}
}