I installed and configured emboss and can run the simple command line arguments for getting the consensus of one previously aligned multifasta file:
% cons
Create a consensus sequence from a multiple alignment
Input (aligned) sequence set: dna.msf
output sequence [dna.fasta]: aligned.cons
This is perfect for dealing with one file at a time, but I have hundreds to process. I have started to write a perl script with a foreach loop to try and process this for every file, but I guess I need to be outside of the script to run these commands. Any clue on how I can run a command line friendly program for getting a single consensus sequence in fasta format from a previously aligned multifasta file, for many files in succession? I don't have to use emboss- I could use another program. Here is my code so far:
#!/usr/bin/perl
use warnings;
use strict;
my $dir = ("/Users/roblogan/Documents/Clustered_Barcodes_Aligned");
my @ArrayofFiles = glob "$dir/*"; #put all files in the directory into an array
#print join("\n", @ArrayofFiles), "\n"; #diagnostic print
foreach my $file (@ArrayofFiles){
print 'cons', "\n";
print "/Users/roblogan/Documents/Clustered_Barcodes_Aligned/Clustered_Barcode_Number_*.*.Sequences.txt.out", "\n";
print "*.*.Consensus.txt", "\n";
}
EMBOSS cons has two mandatory qualifier:
- sequence( to provide the input sequence)
- outseq (for output).
so you need to provide the above to field .
Now change your code little bit to run multiple program:
my $count=1;
foreach my $file (@ArrayofFiles){
$output_path= "/Users/roblogan/Documents/Clustered_Barcodes_Aligned/";
my $output_file = $output_path. "out$count";# please change here to get your desired output filename
my $command = "cons -sequence '$file' -outseq '$output_file' ";
system($command);
$count ++;
}
Hope the above code will work for you.