Search code examples

Substitute the ICON reference for nothing

On an export file of more than 2600 bookmarks from Firefox, I want to import them into Buku which seems to bug with the ICON in the html file. So I want to substitute the ICON reference for nothing. Here's an example, the shortest one:


I've tried

sed -e 's/^ICON=\"data:image\/png;base64,^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/][AQgw]==|[A-Za-z0-9+/]{2}[AEIMQUYcgkosw048]=)?$\"$//g' firefox_bookmarks_copie.html > test1.html

sed -e 's/^ICON="[[:print:]]"$//gi' firefox_bookmarks_copie.html > test2.html

sed -e 's/^ICON="data:image(\/[^;]+;base64[^"]+)"$//g' firefox_bookmarks_copie.html > test3.html

awk '{gsub(/^ICON="[:print:]"$/,"");}' firefox_bookmarks_copie.html > copie4.html

AWK seems to cause me problems when saving in copie4.html

perl -0pe 's/^ICON="data:image(\/[^;]+;base64[^"]+)"$//' firefox_bookmarks_copie.html >> copie5.html

The site seems to be telling me that my subsitution REGEX is effective with


Can you help me?


  • Assumptions:

    • OP want's to remove ALL ICON="..." strings from the html file

    Using the following (heavily) modified sample html file for demo purposes:

    $ cat bm.html
    <!DOCTYPE html>
    ... some other stuff ...
    ... some other stuff ...
          <DT><A HREF="" ICON="...snip_#1...z4gPC9zdmc+">some description</A>
          <DT><A HREF="" ICON="...snip_#2...z4gPC9zdmc+">some description</A>

    NOTE: the ^^^^^^^^^^^^^^ lines do not exist in bm.html but are added here to highlight the strings we're looking for

    General approach - look for the consecutive strings a) ICON=", b) [^"]* (string that contains no double quotes) and c) "

    One sed idea:

    $ sed 's/ICON="[^"]*"//g' bm.html
    <!DOCTYPE html>
    ... some other stuff ...
    ... some other stuff ...
          <DT><A HREF="" >some description</A>
          <DT><A HREF="" >some description</A>

    One awk idea:

    $ awk '{gsub(/ICON="[^"]+"/,"")}1' bm.html
    <!DOCTYPE html>
    ... some other stuff ...
    ... some other stuff ...
          <DT><A HREF="" >some description</A>
          <DT><A HREF="" >some description</A>

    NOTE: for this particular html file the global options (sed + /g; awk + gsub() (as opposed to sub()) is overkill since there's only one match per line; if linefeeds were to be removed (thus leaving a single long line of data) the global options insure all ICON="..." matches are replaced within a single line