Search code examples
recursiongrepfilepathline-numbers

recursively grep through a directory, and extract the contents between the tags


How may we recursively grep through a directory, and extract the contents specified below where the lines between the tags are located, i.e. line numbers and file location?

... < start > contents to be extracted
this line as well 
and this line
and before the tag < / start >

Solution

  • If it has to be grep, use that command:

    grep -PzoHnr "(?s)< start >.*< / start >" .
    

    Explanation:

    • -P: Activate perl regular expressions
    • -z: Treat the input as a set of lines, each terminated by a zero byte
    • -o: Print only matches
    • -H: Add the filename in front of the match
    • -n: Add the line number in front of the match
    • -r: Read all files under each directory, recursively.
    • (?s): Activates PCRE_DOTALL, which means that . finds any character or newline
    • < start >.*< / start > is the regular expression

    Alternatively, here is an awk solution as well:

    awk '/\<\ start\ \>/,/\<\ \/\ start\ \>/{print FILENAME ":" FNR ":" $0}' $(find . -type f)
    

    Explanation:

    • /\<\ start\ \>/,/\<\ \/\ start\ \>/: Finds all between < start > and < / start >
    • {print FILENAME ":" FNR ":" $0}: Prints the filename, the line number and the line
    • $(find . -type f) lists only the files in the directory recusively