Search code examples
pythonperlbiopythonbioperl

Calculating the distance between atomic coordinates


I have a text file as shown below

ATOM    920  CA  GLN A 203      39.292 -13.354  17.416  1.00 55.76           C 
ATOM    929  CA  HIS A 204      38.546 -15.963  14.792  1.00 29.53           C
ATOM    939  CA  ASN A 205      39.443 -17.018  11.206  1.00 54.49           C  
ATOM    947  CA  GLU A 206      41.454 -13.901  10.155  1.00 26.32           C
ATOM    956  CA  VAL A 207      43.664 -14.041  13.279  1.00 40.65           C 
.
.
.

ATOM    963  CA  GLU A 208      45.403 -17.443  13.188  1.00 40.25           C  

I would like to calculate the distance between two alpha carbon atoms i.e calculate the distance between first and second atom and then between second and third atom and so on..... The distance between two atoms can be expressed as:distance = sqrt((x1-x2)^2+(y1-y2)^2+(z1-z2)^2) .

The columns 7,8 and 9 represents x,y and z co-ordinates respectively.I need to print the distance and the corresponding residue pairs(column 4) as shown below.(the values of distance are not real)

GLN-HIS   4.5
HIS-ASN   3.2
ASN-GLU   2.5

How can I do this calculation with perl or python?


Solution

  • If your data is separated by whitespace, a simple split can do the job. Buffering the lines to compare them to each other sequentially.

    use strict;
    use warnings;
    
    my @line;
    while (<>) {
        push @line, $_;            # add line to buffer
        next if @line < 2;         # skip unless buffer is full
        print proc(@line), "\n";   # process and print 
        shift @line;               # remove used line 
    }
    
    sub proc {
        my @a = split ' ', shift;   # line 1
        my @b = split ' ', shift;   # line 2
        my $x = ($a[6]-$b[6]);      # calculate the diffs
        my $y = ($a[7]-$b[7]);
        my $z = ($a[8]-$b[8]);
        my $dist = sprintf "%.1f",                # format the number
                       sqrt($x**2+$y**2+$z**2);   # do the calculation
        return "$a[3]-$b[3]\t$dist"; # return the string for printing
    }
    

    Output (with the sample data):

    GLN-HIS 3.8
    HIS-ASN 3.8
    ASN-GLU 3.9
    GLU-VAL 3.8
    

    If your data is tab separated, you can split on /\t/ instead of ' '.