Search code examples
perlprotein-database

Performing a function on each combination of variables in two arrays


I am trying to take one set of data and subtract each value in that data by another set of data.

For example:

Data set one (1, 2, 3)
Data set two (1, 2, 3, 4, 5)

So I should get something like (1 - (1 .. 5)) then (2 - (1..5)) and so on.

I currently have:

#!/usr/bin/perl
use strict;
use warnings;

my $inputfile = $ARGV[0];

open( INPUTFILE, "<", $inputfile ) or die $!;

my @array = <INPUTFILE>;

my $protein = 'PROT';
my $chain   = 'P';
my $protein_coords;

for ( my $line = 0; $line <= $#array; ++$line ) {
    if ( $array[$line] =~ m/\s+$protein\s+/ ) {
        chomp $array[$line];
        my @splitline = ( split /\s+/, $array[$line] );
        my %coordinates = (
            x => $splitline[5],
            y => $splitline[6],
            z => $splitline[7],
        );
        push @{ $protein_coords->[0] }, \%coordinates;
    }
}

print "$protein_coords->[0]->[0]->{'z'} \n";

my $lipid1 = 'MEM1';
my $lipid2 = 'MEM2';
my $lipid_coords;

for ( my $line = 0; $line <= $#array; ++$line ) {
    if ( $array[$line] =~ m/\s+$lipid1\s+/ || $array[$line] =~ m/\s+$lipid2\s+/ ) {
        chomp $array[$line];
        my @splitline = ( split /\s+/, $array[$line] );
        my %coordinates = (
            x => $splitline[5],
            y => $splitline[6],
            z => $splitline[7],
        );
        push @{ $lipid_coords->[1] }, \%coordinates;
    }
}

print "$lipid_coords->[1]->[0]->{'z'} \n";

I am trying to take every value in $protein_coords->[0]->[$ticker]->{'z'} minus each value in $lipid_coords->[1]->[$ticker]->{'z'}.

My overall objective is to find (z2-z1)^2 in the equation d = sqrt((x2-x1)^2+(y2-y1)^2-(z2-z1)^2). I think that if I can do this once then I can do it for X and Y also. Technically I am trying to find the distance between every atom in a PDB file against every lipid atom in the same PDB and print the ResID for distance less than 5A.


Solution

  • The easiest way to do this is to do your calculations while you're going through file two:

    for (my $line = 0; $line <= $#array; ++$line) {
        if (($array[$line] =~ m/\s+$lipid1\s+/) | ($array[$line] =~ m/\s+$lipid2\s+/)) {  
            chomp $array[$line];
            my @splitline = (split /\s+/, $array[$line]);
            my %coordinates = (x => $splitline[5],
                               y => $splitline[6],
                               z => $splitline[7],
                              );
            push @{$lipid_coords->[1]}, \%coordinates;
    
            # go through each of the sets of protein coors in your array...
            for my $p (@{$protein_coords->[0]}) {
                # you can store this value however you want...
                my $difference = $protein_coords->[0][$p]{z} - $coordinates{z};
            }
        }
    }
    

    If I were you, I would use some form of unique identifier to allow me to access the data on each combination -- e.g. build a hash of the form $difference->{<protein_id>}{<lipid_id>} = <difference>.