Search code examples
pythonpython-3.xfilefor-loopreadlines

How to read a specific portion of a txt file in Python?


I need to extract a portion of text from a txt file.
The file looks like this:

STARTINGWORKIN DD / MM / YYYY HH: MM: SS
... text lines ...
... more text lines ...
STARTINGWORKING DD / MM / YYYY HH: MM: SS
... text lines I want ...
... more text lines that I want ...

  • The file starts with STARTINGWORK and ends in text lines.
    I need to extract the final text portion after the last STARTINGWORK, without the STARTINGWORK str

I tried use 3 for loops (one to start, another read the between line, and the last to end)

     file = "records.txt"
     if file.endswith (".txt"):
       if os.path.exists (file):
         lines = [line.rstrip ('\ n') for line in open (file)]
         for line in lines:
             #extract the portion

Solution

  • You can have a variable that saves all the lines you have read since the last STARTINGWORK.
    When you finish processing the file you have just what you need.

    Certainly you do not need to read all the lines to a list first. You can read it directly in the open file and that returns one line at a time. i.e.:

    result = []
    with open(file) as f:
        for line in f:
            if line.startswith("STARTINGWORK"):
                result = []       # Delete what would have accumulated
            result.append(line)  # Add the last line read
    print("".join(result))
    

    In the result you have everything after the last STARTINGWORK, inclusive you can keep the result [1:] if you want to delete the initial STARTINGWORK

    - Then in the code:

    #list
    result = []
    
    #function
    def appendlines(line, result, word):
      if linea.startswith(word):
        del result[:]
      result.append(line)
      return line, result
    
    with open(file, "r") as lines: 
      for line in lines:              
        appendlines(line, result, "STARTINGWORK")
    new_result = [line.rstrip("\n") for line in result[1:]]