Search code examples
pythontextextractparagraph

Get paragraphs after a certain symbol: need better output


I‘m a python beginner.

I have this code that get‘s all paragraphs after the symbols ‘*****‘

import re


file = open('/Users/simon/DRIVE/ARCHIVED/Tools at Hand/PASTE.txt', mode='r')

result = [s.strip() for s in re.findall(r'^\*{4,}((?:\r?\n(?!\s*$|\*{4}).+)*)', file.read(), 

re.MULTILINE)]



print(result)
file.close()

Input:

Lorem ipsum dolor sit amet, consectetur adipiscing elit.

****
Sed id placerat magna.

*******
Pellentesque in ex ac urna tincidunt tristique. 

Etiam dapibus faucibus gravida.

The output I need:

Sed id placerat magna.

Pellentesque in ex ac urna tincidunt tristique. 

The output I get:

'Sed id placerat magna.', 'Pellentesque in ex ac urna tincidunt tristique.'

I can‘t seem to figure out how to output each sentence per paragraph.


Solution

  • Using re.findall returns a list with the values of the capture group.

    You could for example print the list prepending * to unpack the result and set the separator to 2 newlines.

    import re
    
    file = open('/Users/simon/DRIVE/ARCHIVED/Tools at Hand/PASTE.txt', mode='r')
    
    result = [s.strip() for s in re.findall(r'^\*{4,}((?:\r?\n(?!\s*$|\*{4}).+)*)', file.read(), re.MULTILINE)]
    
    print(*result, sep="\n\n")
    
    file.close()
    

    Output

    Sed id placerat magna.
    
    Pellentesque in ex ac urna tincidunt tristique.