I have a big corpus which is segmented at the sentence level. That means each line contains one sentence. Some of these lines end with full stop (period) some don't. I am looking for an efficient way to add full stops to the end of the lines which don't end with one. For instance a shell script that benefits from sed or awk to do this task.
Sed is probably the simplest approach for this:
$ cat file
sentence one
sentence two.
sentence three
$ sed 's/[^.]$/&./' file
sentence one.
sentence two.
sentence three.
On lines that don't end with a period [^.]$
replace the last character with the matched last character followed by a period &.
. You should watch out for lines with trailing spaces that might contain the period as the last viable character.
Edit:
With awk
I would do:
$ awk '/[^.]$/{$(NF+1)="."}1' FS= OFS= file
sentence one.
sentence two.
sentence three.