Search code examples
regexbashsednon-greedy

Bash: uppercase text inside html tag with sed


echo -e '<h1>abcd</h1>\n<h2>efgh</h2>' | sed 's#<h1>(.*?)<\h1>#<h1>\U&</h1>#g'

The desired output is:

<h1>ABCD</h1>
<h2>efgh</h2>

Any ideas? Thanks.


Solution

  • This will work only for your case and is not parsing HTML.

    DISCLAIMER

    First read: https://stackoverflow.com/a/1732454/7939871

    This parsing with a sed Search-and-replace Regular Expression is a shortcut interpretation.

    It is in no way for use in any kind of production setup; as it would break on so many valid HTML syntax or layout variations like: namespaces, multi-line, spacing, nesting, use of attributes, entities, CDATA…

    sed -E 's#<h1>(.*)</h1>#<h1>\U\1\E</h1>#g' <<<$'<h1>abcd</h1>\n<h2>efgh</h2>'
    

    Basically, it switches-on upper-casing \U, then prints the captured group 1 \1, then switches-off upper-casing \E.