Search code examples
sedgrepunix-text-processing

Extract substrings between strings


I have a file with text as follows:

###interest1 moreinterest1### sometext ###interest2###
not-interesting-line
sometext ###interest3###
sometext ###interest4### sometext othertext ###interest5### sometext ###interest6###

I want to extract all strings between ### .

My desired output would be something like this:

interest1 moreinterest1
interest2
interest3
interest4
interest5
interest6

I have tried the following:

grep '###' file.txt | sed -e 's/.*###\(.*\)###.*/\1/g'

This almost works but only seems to grab the first instance per line, so the first line in my output only grabs

interest1 moreinterest1

rather than

interest1 moreinterest1
interest2

Solution

  • Here is a single awk command to achieve this that makes ### field separator and prints each even numbered field:

    awk -F '###' '{for (i=2; i<NF; i+=2) print $i}' file
    
    interest1 moreinterest1
    interest2
    interest3
    interest4
    interest5
    interest6
    

    Here is an alternative grep + sed solution:

    grep -oE '###[^#]*###' file | sed -E 's/^###|###$//g'
    

    This assumes there are no # characters in between ### markers.