Search code examples
pythonlistiteratorcombinationspython-itertools

how to make two or more combinations of specific letter?


I am novice to python and I was struggling to do this for last one week could someone help me out of this problem which would be very helpful to finish my project.

I tried to do single mutation and their 2,3 combinations based on the user input for given sequence:

INPUT SEQUENCE: >PEACCEL

USER MUTATION INPUT FILE:
E2R
C4W
E6G

#!/usr/bin/python
import getopt
import sys
import itertools as it 
from itertools import groupby

def main(argv):
try:
    opts,operands = getopt.getopt(sys.argv[1:],'i:m:o:'["INPUT_FILE:=","MUTATIONFILE:=","OUTPUT_FILE:=","help"])
    if len(opts) == 0:
        print "Please use the correct arguments, for usage type --help "
    else:
        for option,value in opts:
            if option == "-i" or option == "--INPUT_FILE:":
                seq=inputFile(value)
            if option == "-m" or option == "--MUTATION_FILE:":
                conA=MutationFile(value)
            if option == "-o" or option == "--OUTPUT_FILE:":
                out=outputName(value)
        return seq,conA
except getopt.GetoptError,err:
       print str(err)
       print "Please use the correct arguments, for usage type --help"

def inputFile(value):
try:
    fh = open(value,'r')
except IOError:
    print "The file %s does not exist \n" % value
else:
    ToSeperate= (x[1] for x in groupby(fh, lambda line: line[0] == ">"))
    for header in ToSeperate:
        header = header.next()[1:].strip()
        Sequence = "".join(s.strip() for s in ToSeperate.next())
        return Sequence

 def MutationFile(value):
 try:
    fh=open(value,'r')
    content=fh.read()
    Rmcontent=str(content.rstrip())
 except IOError:
    print "The file %s does not exist \n" % MutFile
 else:
    con=list(Rmcontent)
    return con

def Mutation(SEQUENCES,conA):
 R=len(conA)
 if R>1:
    out=[]  
    SecondNum=1
    ThirdChar=2
    for index in range(len(conA)):
        MR=conA[index]
        if index==SecondNum:
            SN=MR
            SecondNum=SecondNum+4
        if index==ThirdChar:
            TC=MR
            ThirdChar=ThirdChar+4

            SecNum=int(SN.rstrip())
            MutateResidue=str(TC.rstrip())
            for index in range(len(SEQUENCES)):
                if index==SecNum-1:
                    NonMutate=SEQUENCES[index]
                    AfterMutate=NonMutate.replace(NonMutate,MutateResidue)
                    new=SEQUENCES[ :index]+AfterMutate+SEQUENCES[index+1: ]
                    MutatedInformation= ['>',NonMutate,index+1,MutateResidue,'\n',new]
                    values2 = ''.join(str(i)for i in MutatedInformation)

if __name__ == "__main__":          
seq,conA=main(sys.argv[1:])
Mutation(seq,conA)

This is my part of program where I replaced R,W,G of (2,4,6) to E,C,E then stored those replaced letter into variable called R which contain three lines like this:-

PrACCEL
PEAwCEL
PEACCgL

Now, I want to make 2 and 3 combination out of these three single mutations. It would be like Comb of two mutations in one line and three mutation in one line.

sample and expected output will be like this:

2C
   PrAwCEL 
   PrACCgL 
   PEAwCgL 
3C
   PrAwCgL 

Algorithm

his is part of my code so i will explain my algorithm

1.I read the mutation file which has three characters for eg (E2R) where (E)is amino acid letter which is (2) position of input sequence PEACCEL and third letter (R) is E2 going to be R.

2.So first I extracted positions and third variable from user mutation file and stored those into variables SecNum and MutateResidue(thirdchar).

3.then,I used for loop to read a sequence(PEACCEL) by index then whichever index match to SecNUm(E2,4,6) i replaced those sequence with those with Mutate Residue which is third character in mutation file (2R,4W,6G)

4.then finally I joined mutated residue index with other residue by this line:(new=SEQUENCES[:index]+AfterMutate+SEQUENCES[index+1: ]

Thanks in advance


Solution

  • from itertools import combinations,chain
    from collections import Counter
    
    
    def Mutation(SEQUENCES,conA):
        
        #mutations=map(lambda x:x.strip(),open('a.txt','r').readlines())
        
        mutation_combinations= chain.from_iterable([list(combinations(conA,i))for i in range(1,4)])
        #[('E2R',), ('C4W',), ('E6G',), ('E2R', 'C4W'), ('E2R', 'E6G'), ('C4W', 'E6G'), ('E2R', 'C4W', 'E6G')]
        
        for i in mutation_combinations:
            print "combination :"+'_'.join(i)
            c=Counter({})
            temp_string=SEQUENCES
            for j in i:
                c[j[1]]=j[2].lower()
            for index,letter in c.items():
                temp_string=temp_string[:int(index)-1]+letter+temp_string[int(index):]
            print temp_string
    

    output

    combination :E2R
    PrACCEL
    combination :C4W
    PEAwCEL
    combination :E6G
    PEACCgL
    combination :E2R_C4W
    PrAwCEL
    combination :E2R_E6G
    PrACCgL
    combination :C4W_E6G
    PEAwCgL
    combination :E2R_C4W_E6G
    PrAwCgL
    

    Algorithm i followed:

    1. read the mutation sequences like E2R.... from a file using mutations=map(lambda x:x.strip(),open('a.txt','r').readlines())

    2. made the combinations of the mutations mutation_combinations= chain.from_iterable([list(combinations(mutations,i))for i in range(1,4)]) if you have 4 mutations you want all four change the range value to 5

    3. so for each combination i replaced them with specified character

      for j in i:
          c[j[1]]=j[2].lower()
      

    i used above counter to keep track of which character to be replaced during mutation combination