Search code examples
pythonsplitnlpreadlines

How to split a txt into custom paragraphs (and then insert them into excel columns)?


I need to retrieve each section of the text. These sections are identifiable because they start with the words 'First','Second','Third' and so on. Then I need to insert each section in a different column in excel. For example the text reads:

First blablablabla. Then blablabla. Last blabla.

Second blabla. Then blabla. Last blabla.

Third blabla. also blabla. Fourth bla.

I know this code is entirely wrong but it's what I have tried so far:

with open("adress","r", encoding="utf8") as f:
  lines = f.readlines()

  for i in lines:
    words= i.split('\n\n')
    print(words)

    for i in words:
        print(i,i=='First')

Solution

  • This code will split your text correctly:

    with open("address","r", encoding="utf8") as file:
        sections = file.read()
    
        sections = sections.split('\n\n')
        for section in sections:
            print(section)
    
    

    You can't split string by two newlines when you earlier split it by newline.