I have a problem with a very large text file which looks like following:
A T T A G C A
A AT A G C A
T TT AG G A
G T T A G C A
Every character was split by \t
,but some characters are connected, I want to add \t
to these sequence. What I need is like following:
A T T A G C A
A A T A G C A
T T T A G C A
G T T A G C A
What can I do in Python? and I need to fully use my computer memory to speed up the process.
Assuming the input is stored in in.txt
, an elegant solution would be
import re
with open('in.txt') as fin, open('out.txt', 'w') as fout:
for line in fin:
fout.write('\t'.join(re.findall('\w', line))+'\n')
The output is stored in the file out.txt
.