Search code examples
shelltextsedawkend-of-line

linux shell - adding full stop (period) to end of lines which do not end with full stop, in a corpus


I have a big corpus which is segmented at the sentence level. That means each line contains one sentence. Some of these lines end with full stop (period) some don't. I am looking for an efficient way to add full stops to the end of the lines which don't end with one. For instance a shell script that benefits from sed or awk to do this task.


Solution

  • Sed is probably the simplest approach for this:

    $ cat file
    sentence one
    sentence two.
    sentence three
    
    $ sed 's/[^.]$/&./' file
    sentence one.
    sentence two.
    sentence three.
    

    On lines that don't end with a period [^.]$ replace the last character with the matched last character followed by a period &.. You should watch out for lines with trailing spaces that might contain the period as the last viable character.

    Edit:

    With awk I would do:

    $ awk '/[^.]$/{$(NF+1)="."}1' FS= OFS= file
    sentence one.
    sentence two.
    sentence three.