I have text that looks like this:-
"I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
Here, "ASP.NET" and "Node.js" are to be treated as words. Also, there is no space before "But I...", but it should be treated as a separate sentence.
The expected output is:
["I am an engineer"," I am skilled in ASP.NET","I also know Node.js","But I don't have much experience"]
Is there a way of doing this?
For your current input you may use the following approach with re.split()
function and specific regex pattern:
import re
s = "I am an engineer. I am skilled in ASP.NET. I also know Node.js.But I don't have much experience. "
result = re.split(r'\.(?=\s?[A-Z][^.]*? )', s)
print(result)
The output:
['I am an engineer', ' I am skilled in ASP.NET', ' I also know Node.js', "But I don't have much experience. "]
(?=\s?[A-Z][^.]*? )
- lookahead positive assertion, ensures that sentence delimiter .
is followed by word from next sentence