Search code examples
pythonbashsubprocessstdinbiopython

passing python variable (string) to bash command through echo pipe


I am having trouble passing a string in python (python variable) as input to a sequence alignment program (muscle) on the command line (bash). muscle can take stdin from the command line, e.g.;

~# echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" | muscle

MUSCLE v3.8.31 by Robert C. Edgar

http://www.drive5.com/muscle
This software is donated to the public domain.
Please cite: Edgar, R.C. Nucleic Acids Res 32(5), 1792-97.

- 2 seqs, max length 8, avg  length 8
00:00:00     22 MB(2%)  Iter   1  100.00%  K-mer dist pass 1
00:00:00     22 MB(2%)  Iter   1  100.00%  K-mer dist pass 2
00:00:00     23 MB(2%)  Iter   1  100.00%  Align node       
00:00:00     23 MB(2%)  Iter   1  100.00%  Root alignment
>1
ATTTCTCT
>2
ATTTCTCC

It is this fasta alignment (the last 4 lines) that I am after - you can redirect the output of muscle (echo -e ">1\nATTTCTCT\n>2\nATTTCTCC" | muscle > out.file to get just the fasta alignment, which I need for downstream processing. BUt to get there, I have to pass 'muscle' the FASTA sequence string, which I think is best done via echo as above in bash.

So, the script takes two multiFASTA files and and pairs each FASTA sequence based on a list of ids for each file - this is working (though I realize it is perhaps not the most efficient way to go about it - I am new python user). Then I need to align each set in muscle before calculating distances/difference.

Here is what I have so far :

#! /env/python
from pairwise_distances import K2Pdistance
import pairwise_distances
import subprocess
from subprocess import call
import os     

fasta1=['']
for line in open('test1.fasta'):
    if not line.startswith('>'):
         fasta1.append(line.strip())

fasta2=['']
for line in open('test2.fasta'):
    if not line.startswith('>'):
         fasta2.append(line.strip())

for l1, l2 in zip(open('test1.list'), open ('test2.list')):
 try:
   a=fasta1[int(l1)]
 except IndexError,e:  
   a="GGG"

 try:
   b=fasta2[int(l2)]
 except (IndexError):
   b="CCC"

 temp_align=str(">1"+'\n'+a+'\n'+">2"+'\n'+b) 

 first=subprocess.check_output(['echo','-e',temp_align])
 print first
 subprocess.call(['bash','muscle'], stdin=first.stdout)
 print second

 #new=K2Pdistance(outfast1,outfast2) 
 #subprocess.Popen(['bash','muscle'], stdin=subprocess.check_output(['echo','-e',temp_align], stdout=subprocess.PIPE).std.out)`

The 'temp_align' variable is what I want to pass to muscle - it is the result of combining the appropriate fasta sequence from each multiFASTA file, for each loop over the ids/list, and is formatted like a FASTA file.

This issue is that I can echo the FASTA string, but I cant seem to "pipe" that via stdin to muscle... The main error I get is: AttributeError: 'str' object has no attribute 'stdout'.

~#python Beta3.py 
>1
ATCGACTACT
>2
ATCGCGCTACT

Traceback (most recent call last):
  File "Beta3.py", line 38, in <module>
    subprocess.call(['bash','muscle'], stdin=first.stdout)
AttributeError: 'str' object has no attribute 'stdout'

Can subprocess or other commandline modules take strings as stdin? IF not, I am not sure how I can echo and then pipe into muscle... Any other ideas would be much appreciated. I tried a few options from this thread unix.stackexchange

but no luck yet.

EDIT: Here are some example files:

~# cat test1.fasta 
>1
ATCGACTACT
>2
ACTCAGTCA
>3
TTCACAGGG
~# cat test2.fasta 
>1
ATCGCGCTACT
>2
GGCGTCAGTCA
>3
TTCAACCCCAGGG
~# cat test1.list 
1
2
3
~# cat test2.list 
1
3
2

EDIT 2: That previous post is related, but my question is about linking two bash commands starting with a python variable that is a string...and then, ideally, capturing the stdout of the second command back into a python variable... I dont quite see how to translate the answers on that post into a solution for my specific question...I guess I don;t fully understand what the poster was trying to do.


Solution

  • It seems that you want to communicate with the muscle process, then you need a PIPE, use this

    (out, err) = subprocess.Popen(['muscle'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE).communicate(temp_align)
    print out