Search code examples
regexawkgrepclause

Grep grammatical clauses


I'm trying to find a way to grep grammatical clauses from an ebook sample. Here's what the input looks like:

This is a test my friend, this is just a test; I'm going to do some shopping:`what do you need?`
Nothing, he said.

Desired output:

This is a test my friend
this is just a test
I'm going to do shopping
what do you need
Nothing
he said

Any ideas on how one could achieve this?

Thank you very much !


Solution

  • You can use gnu-awk like this:

    awk -v RS='[\n.,;:`?]+' -v ORS='\n' '{$1=$1} 1' file
    This is a test my friend
    this is just a test
    I'm going to do some shopping
    what do you need
    Nothing
    he said