Search code examples
pythonregexsplit

Python regex to match one or more line breaks?


I want to split a text file into paragraphs, separated by 1 or more empty lines. For example:

# file.txt
"Paragraph1
Some text

Paragraph2
More text

Paragraph3
some more text"

I tried using regex, but I'm not sure if I'm doing it correctly. In the example I'm trying to print the second paragraph only, but I get a list index out of range error. But when I print p[0] it prints the whole text file. What am I doing wrong? Should I use a different regex expression? Or other methods to split the file into paragraphs?

with open(file) as f:
    text = f.read()

p = text.split("[\r\n]+")
print(p[1])

Solution

  • Use re.split()

    >>> import re
    >>> re.split(r'[\r\n][\r\n]+', text)
    ['Paragraph1\nSome text', 'Pragraph2\nMore text', 'Paragraph3\nsome more text']