I‘m a python beginner.
I have this code that get‘s all paragraphs after the symbols ‘*****‘
import re
file = open('/Users/simon/DRIVE/ARCHIVED/Tools at Hand/PASTE.txt', mode='r')
result = [s.strip() for s in re.findall(r'^\*{4,}((?:\r?\n(?!\s*$|\*{4}).+)*)', file.read(),
re.MULTILINE)]
print(result)
file.close()
Input:
Lorem ipsum dolor sit amet, consectetur adipiscing elit.
****
Sed id placerat magna.
*******
Pellentesque in ex ac urna tincidunt tristique.
Etiam dapibus faucibus gravida.
The output I need:
Sed id placerat magna.
Pellentesque in ex ac urna tincidunt tristique.
The output I get:
'Sed id placerat magna.', 'Pellentesque in ex ac urna tincidunt tristique.'
I can‘t seem to figure out how to output each sentence per paragraph.
Using re.findall returns a list with the values of the capture group.
You could for example print the list prepending *
to unpack the result
and set the separator to 2 newlines.
import re
file = open('/Users/simon/DRIVE/ARCHIVED/Tools at Hand/PASTE.txt', mode='r')
result = [s.strip() for s in re.findall(r'^\*{4,}((?:\r?\n(?!\s*$|\*{4}).+)*)', file.read(), re.MULTILINE)]
print(*result, sep="\n\n")
file.close()
Output
Sed id placerat magna.
Pellentesque in ex ac urna tincidunt tristique.