Search code examples
pythonregexpython-2.7pyscripter

Python program to extract text from a text file?


I have a text file which I obtained from converting a .srt file. The content is as follows:

1
0:0:1,65 --> 0:0:7,85
Hello, my name is Gareth, and in this
video, I'm going to talk about list comprehensions


2
0:0:7,85 --> 0:0:9,749
in Python.

I want only the words present the text file such that the output is a new textfile op.txt, with the output represented as:

Hello
my
name 
is
Gareth
and

and so on.

This is the program I'm working on:

import os, re
f= open("D:\captionsfile.txt",'r')
k=f.read()
g=str(k)
f.close()
w=re.search('[a-z][A-Z]\s',g)
fil=open('D:\op.txt','w+')
fil.append(w)
fil.close()

But the output I get for this program is:

None
None
None

Solution

  • If we assume m is a word and short for am and that in.txt is your textfile, you can use

    import re
    
    with open('in.txt') as intxt:
        data = intxt.read()
    
    x = re.findall('[aA-zZ]+', data)
    print(x)
    

    which will produce

    ['Hello', 'my', 'name', 'is', 'Gareth', 'and', 'in', 'this', 'video', 'I', 'm', 'going', 'to', 'talk', 'about', 'list', 'comprehensions', 'in', 'Python']
    

    You can now write x to a new file with:

    with open('out.txt', 'w') as outtxt:
        outtxt.write('\n'.join(x))
    

    To get

    I'm
    

    instead of

    I
    m
    

    you can use re.findall('[aA-zZ\']+')