Search code examples
perlpdb-files

How to count each element from a pdb file


I am having trouble trying to count the amount of each element provided in a pdb file. So far, the code has achieved to take the lines from the file that starts with ATOM. I have also gotten to the point where I can print each element. But now I want to count each element such as C,O,N, and so on and list them in alphabetical order. The data looks like this:

 ATOM   4851  O   PRO A 715      89.164  76.083  75.292  1.00 99.41           O  
 ATOM   4852  CB  PRO A 715      88.324  78.267  73.865  1.00 95.88           C  
 ATOM   4853  CG  PRO A 715      88.836  78.838  72.540  1.00 95.93           C  
 ATOM   4854  CD  PRO A 715      90.288  78.546  72.593  1.00 94.99           C  
 ATOM   4855  N   PHE A 716      90.009  77.320  76.994  1.00100.00           N  
 ATOM   4856  CA  PHE A 716      90.100  76.203  77.966  1.00 99.66           C  
 ATOM   4857  C   PHE A 716      89.942  76.667  79.419  1.00 99.17           C  
 ATOM   4858  O   PHE A 716      88.895  76.334  80.027  1.00100.00           O  
 ATOM   4859  CB  PHE A 716      91.409  75.402  77.819  1.00 97.04           C  
 ATOM   4860  CG  PHE A 716      91.290  74.219  76.899  1.00 96.00           C  
 ATOM   4861  CD1 PHE A 716      90.323  73.236  77.133  1.00 94.17           C  
 ATOM   4862  CD2 PHE A 716      92.112  74.107  75.774  1.00 94.87           C  
 ATOM   4863  CE1 PHE A 716      90.170  72.157  76.260  1.00 93.04           C  
 ATOM   4864  CE2 PHE A 716      91.971  73.037  74.894  1.00 93.41           C  
 ATOM   4865  CZ  PHE A 716      90.996  72.057  75.137  1.00 95.29           C  

My code is :

open (FILE, $ARGV[0])
    or die "Could not open file\n";

my @newlines;
while ( my $line = <FILE> ) {
    if ($line =~ m/^ATOM.*/) {
    push @newlines, $line;
    }
}

##############################################################
#This function will take out the element from each line
#The element is from column 77 and contains one or two letters
#The function will also sort each element alphabetically and count them
sub atomfreq {
    foreach my $record1(@newlines) {
      my @element = substr($record1, 76, 2);
      my @sortelement = sort(@element);
      print "@sortelement\n";
    }
}

Thanks.


Solution

  • For counting the occurences of the elements C, O and N (and others) a hash is the proper data structure. Use the elements as keys and increment the corresponding value. After the loop you can then print the hash sorted by its keys:

    sub atomfreq {
        my %count;
        foreach my $record1(@newlines) {
            my $element = substr($record1, 76, 2);
            $count{$element} += 1;
        }
        foreach my $element ( sort keys %count ) {
            print "element=$element, count=$count{$element}\n";
        }
    }