Search code examples
regexgrephtml-parsing

How to match content between HTML specific tags with attribute using grep?


Which regular expression should I use with the command grep if I wanted to match the text contained within the tag <div class="Message"> and its closing tag </div> in an HTML file?


Solution

  • Here's one way using GNU grep:

    grep -oP '(?<=<div class="Message"> ).*?(?= </div>)' file
    

    If your tags span multiple lines, try:

    < file tr -d '\n' | grep -oP '(?<=<div class="Message"> ).*?(?= </div>)'