Search code examples
sed

Sed to replace modify large html list of links


What I've got:

<ul>
<li><a href="https://example.com.com/link-1"></li>
<li><a href="https://example.com.com/link-2"></li>
<li><a href="https://example.com.com/link-3" ></li>
<!-- many more items here --> 
</ul>

Desired end result:

<ul>
<li><a href="link-1.html"></li>
<li><a href="link-2.html"></li>
<li><a href="link-3.html" ></li>
<!-- many more items here --> 
</ul>

Currently I've come up with something like:

sed 's/https:\/\/example.com.com//g' test.txt | sed 's/" *>/.html">/g'

But this is clearly (a) inefficient and (b) won't work inline (i.e. sed -i when used in conjunction with find, for example)

What would a better approach for this be ?


Solution

  • You could use a capturing group to avoid the second sed invocation like this :

    sed -e 's%https://[^"]*/\([^"]*\)%\1.html%'
    

    The % separator saves the need for escaping forward slashes.

    Edit

    If you want to make sure the substitution only occurs for instances of https://example.com inside lines starting with <li><a ...> tags, you could try:

    sed -e '/^<li><a /s%"https://example.com[^"]*/\([^"]*\)%"\1.html%'
    

    Based on the data sample you provided, you should get :

    <ul>
    <li><a href="link-1.html"></li>
    <li><a href="link-2.html"></li>
    <li><a href="link-3.html" ></li>
    <!-- many more items here --> 
    </ul>
    

    Hope that helps.