My program takes a text file and splits each sentence into a list using split('.')
meaning that it will split when it registers a full stop however it can be inaccurate.
str='i love carpets. In fact i own 2.4 km of the stuff.'
listOfSentences = ['i love carpets', 'in fact i own 2', '4 km of the stuff']
listOfSentences = ['i love carpets', 'in fact i own 2.4 km of the stuff']
My question is: How do I split the end of sentences and not at every full stop.
If you have sentences both ending with "." and ". ", you can try regex:
import re
text = "your text here. i.e. something."
sentences = re.split(r'(?<!\w\.\w.)(?<![A-Z][a-z]\.)(?<=\.|\?)\s', text)
source: Python - RegEx for splitting text into sentences (sentence-tokenizing)