Search code examples
pythonregexnlpnltksublimetext3

find and replace with correct sentence case sentences starting with lowercase. regex or sublime


I have text where some sentences start with lowercase. i need to find them and replace with correct sentence case.some punctuations are incorrect. i.e. sentence starting after full stop without space.

i.e.

.this sentence
and this.also this. and this.This one is not.

replace with ->

.This sentence
And this.Also this. And this.This one is not.

sublime text 3 solution, regex , or python nltk solution is suitable.

i tried this solution. but it is slow and does not find sentences without space after full stop.

import nltk.data
from nltk.tokenize import sent_tokenize
text = """kjdshkjhf. this sentence
and this.also this. and this. This one is not."""

aa=sent_tokenize(text)
for a in aa:
    if (a[0].islower()):
        print a
        print "****"

Solution

  • You can use this pattern

    ^([^a-zA-Z]*)([a-z])
    

    enter image description here

    and use $1\U$2 as substitution

    Regex Demo

    Update:- If you want to capture first lowercase after each . ( period ) you can use this

    ^([^a-zA-Z]*)([a-z])|(\.\s*)([a-z])
    

    Demo