Search code examples
pythonwordsdata-processing

How to insert tab in a sequential word in python?


I have a problem with a very large text file which looks like following:

A T T A G C A
A AT A G C A
T TT AG G A
G T T A G C A

Every character was split by \t,but some characters are connected, I want to add \t to these sequence. What I need is like following:

A T T A G C A
A A T A G C A
T T T A G C A
G T T A G C A

What can I do in Python? and I need to fully use my computer memory to speed up the process.


Solution

  • Assuming the input is stored in in.txt, an elegant solution would be

    import re
    
    with open('in.txt') as fin, open('out.txt', 'w') as fout:
        for line in fin:
            fout.write('\t'.join(re.findall('\w', line))+'\n')
    

    The output is stored in the file out.txt.