Search code examples
pythonparsingtext

parsing a tagged file python


I would like to export contents of a text file into a list, where each comment is a separate item of the list.

Example of file:

[start] Hello this is comment number 1. aklsdfjkaldsfjasdfklasdflj [start] This is another comment. adfkladsfkjlasjdf [start] this is another.

How can I go about doing this? I have tried:

  1. Replaced each instance of [start] with '\n[start]'

  2. Looping through the string line by line, creating a new 'content' variable.

  3. IF line.startswith('\n[start]') AND IF content variable is not empty, append to a list called 'comments'

  4. EMPTY the content variable

  5. Append the current line to content

ELSE (i.e. if line does not start with [start] but is more lines of the same comment): continue to append to 'content'

I was hoping the above approach would work, but have the following issues:

  1. After replacing each instance of [start] with \nstart, in the pycharm debugger mode, I cannot see it actually worked.
  2. My array is empty after running the above.

Solution

  • You want to know how a file is read bu python:

    with open("file", "r") as the_file:
        for line in the_file:
            print(line.strip())
    

    With code above you read the file line by line and print it to console.

    Now you have 1 or more lines which you want to split by certain value (in your case [start])

    with open("file", "r") as the_file:
        list_of_contents = []
        for line in the_file:
            list_of_contents.extend(line.strip().split("[start]"))
        
        print(list_of_contents)
    

    To achieve same thing with more pythonic way:

    with open("file", "r") as the_file:
        lines = the_file.readlines()
        list_of_contents = [*line.strip().split("[start]") for line in lines]
        print(list_of_contents)