Search code examples
pythonsplitposition

Separating letters in file based on position


I have one .fa file with letters sequence like ACGGGGTTTTGGGCCCGGGGG and .txt file with numbers that show start and stop position like start 2 stop 7. How could I extract letters only from the specific positions from my .fa file and create new file that will contain only letters from the assigned positions? I wrote such code but I got the error "string index out of range'' my position txtx file is just a lit with positions like [[1,52],[66,88].....

my_file = open('dna.fa')
transcript = my_file.read()
positions = open('exons.txt')
positions = positions.read()
coding_sequence = '' # declare the variable

for i in xrange(len(positions)):
    start = positions[i][0]
    stop = positions[i][1]
    exon = transcript[start:stop]
    coding_sequence = coding_sequence + exon
print coding_sequence `

Solution

  • Assuming that your positions are stored in a list called positions, that the name of your infile is infile.fa, and the name of your outfile is outfile.fa:

    with open("infile.fa") as infile:
        text = infile.read()
        letters = "".join(text[i] for i in positions)
        with open("outfile.fa", "w") as outfile:
            outfile.write(letters)
    

    As has been mentioned in @KIDJourney's comment, this could theoretically fail for files large enough that there is not enough memory to store it. Here is how you could do it if that is the case:

    with open("infile.fa") as infile:
        with open("outfile.fa", "a") as outfile:
            outfile.seek(0)
            i = 0
            for line in infile:
                for char in line:
                    if i in positions:
                        outfile.write(char)
                    i += 1