How may we recursively grep through a directory, and extract the contents specified below where the lines between the tags are located, i.e. line numbers and file location?
... < start > contents to be extracted
this line as well
and this line
and before the tag < / start >
If it has to be grep
, use that command:
grep -PzoHnr "(?s)< start >.*< / start >" .
Explanation:
-P
: Activate perl regular expressions-z
: Treat the input as a set of lines, each terminated by a zero byte-o
: Print only matches-H
: Add the filename in front of the match-n
: Add the line number in front of the match-r
: Read all files under each directory, recursively.(?s)
: Activates PCRE_DOTALL
, which means that .
finds any character or newline< start >.*< / start >
is the regular expressionAlternatively, here is an awk
solution as well:
awk '/\<\ start\ \>/,/\<\ \/\ start\ \>/{print FILENAME ":" FNR ":" $0}' $(find . -type f)
Explanation:
/\<\ start\ \>/,/\<\ \/\ start\ \>/
: Finds all between < start >
and < / start >
{print FILENAME ":" FNR ":" $0}
: Prints the filename, the line number and the line$(find . -type f)
lists only the files in the directory recusively