Search code examples
regexstringprintinglinefindall

To find some words in a text file using regex and later print them in a different text file


I need to find some words such as inherited, INHERITANCE, Ingeritable, etc., using regex, in a text file (origin.txt) and later I want to print them in a new text file (origin_spp.txt) and the line where they were found.

This is my code

re_pattern_string = r'(?:inherit|INHERIT|Inherit)*\w'

print('Opening origin.txt')
with open('origin.txt', 'r') as in_stream:
    print('Opening origin_spp.txt')
    with open('origin_spp.txt', 'w') as out_stream:
        for num, line in enumerate (in_stream):
        re_pattern_object = re.compile(re_pattern_string)
        line = line.strip()
        inherit_list = line.split()
        temp_list = re_pattern_object.findall('line')
        complete = origin_list.append('temp_list')
        for word in temp_list:
            out_stream.write(str(num) + '\t{0}\n'.format(word))

print("Done!")
print('origin.txt is closed?', in_stream.closed)
print('origin_spp.txt is closed?', out_stream.closed)

if __name__ == '__main__':
    print(temp_list)

Can you help me, please? I am not getting anything and I do not know where is the error.

Thank you in advance

I need to print the words that I want to find in the origin.txt in a different text file.

This new file must contain the number of the line in the origin.txt plus the word/s.


Solution

  • Your code had some problems:

    • It's redundant to define re.compile inside for.
    • for re_pattern_object.findall('line') and origin_list.append('temp_list') don't wrap variables with ''
    • with findall we don't need iterate lines, it's works for whole text.

    Because you didn't provide input and output I just guess what you want:

    import re
    
    re_pattern_string = r'((?:inherit|INHERIT|Inherit)(\w*))'
    originmain_list = []
    re_pattern_object = re.compile(re_pattern_string)
    print('Opening origin.txt')
    with open('origin.txt', 'r') as in_stream:
        print('Opening origin_spp.txt')
        with open('origin_spp.txt', 'w') as out_stream:
            for num, line in enumerate(in_stream):
                temp_list = re_pattern_object.findall(line)
                for word in temp_list:
                    out_stream.write(str(num) + '\t{0}\n'.format(word[0]))
                    originmain_list.append((num, word[0]))
    
    print("Done!")
    print('origin.txt is closed?', in_stream.closed)
    print('origin_spp.txt is closed?', out_stream.closed)
    print(originmain_list)
    
    

    if origin.txt contains:

    inheritxxxxxxx some text INHERITccccc some text
    Inheritzzzzzzzz some text
    inherit some text INHERIT some text
    Inherit some text
    

    the output in the origin_spp.txt will be

    0   inheritxxxxxxx
    0   INHERITccccc
    1   Inheritzzzzzzzz
    2   inherit
    2   INHERIT
    3   Inherit
    

    The command line output will be:

    Opening origin.txt
    Opening origin_spp.txt
    Done!
    origin.txt is closed? True
    origin_spp.txt is closed? True
    [(0, 'inheritxxxxxxx'), (0, 'INHERITccccc'), (1, 'Inheritzzzzzzzz'), (2, 'inherit'), (2, 'INHERIT'), (3, 'Inherit')]