Search code examples
bashawksedhtml-parsing

Extract content from div with awk/grep


Assuming following html code.

<div class='requirement'>
<div class='req-title'>
The quick brown fox jumps over the lazy dog
</div>
</div>

I want to extract The quick brown fox jumps over the lazy dog using tools like awk or sed, I'm pretty sure it can be done.

I know html parser is the right tools for this job, but this is the only time I'll be dealing with html content.


Solution

  • Assuming the part you want to print is a single line:

    $ awk 'f{print; exit} $0=="<div class=\047req-title\047>"{f=1}' file
    The quick brown fox jumps over the lazy dog
    

    otherwise:

    $ awk 'f{if ($0=="</div>") exit; print} $0=="<div class=\047req-title\047>"{f=1}' file
    The quick brown fox jumps over the lazy dog