Search code examples
pythondna-sequence

Retrieving character's flanking region


I'm trying to localize a particular character "r" in a line and then retrieve the 35characters flanking it on each side. There could be more than one "r" so I'm trying to obtain all of them. I've been trying this code but I'm only getting the headers and I cant figure it out. Any advice?

fhand=open("input.txt")
target = open ("output.txt", "a")
for line in fhand:
    name, id, seq= line.split("\t")
    while atpos < len(seq):
        if atpos == -1:
            break
        atpos = seq.find ("r")
        seq2 = seq[(atpos-35):(atpos+36)]
        line2= name + "\t"+ id + "\t" + seq2 + "\n"
        target.write(line2)
        atpos += 1

print ("Sequences obtained successfully")
target.close()

Solution

  • import csv
    
    with open("input.txt") as infile, open('output.txt', 'w') as fout:
        outfile = csv.writer(fout, delimiter='\t')
        for name, id, seq in csv.reader(infile, delimiter='\t'):
            locs = [i for i,char in enumerate(seq) if char=='r']
            for loc in locs:
                outfile.writerow([name, id, seq[max(loc-35, 0) : loc+36]])