Search code examples
pythonhtmlsvgbeautifulsoup

Replace img tag with in-line SVG with BeautifulSoup


I have an HTML file produced by pandoc, where SVG illustrations have been embedded. The SVG content is encoded in base64 and included in the src attribute of img elements. It looks like this:

<figure>
    <img role="img" aria-label="Figure 1" src="data:image/svg+xml;base64,<base64str>" alt="Figure 1" />
    <figcaption aria-hidden="true">Figure 1</figcaption>
</figure>

I'd like to replace the img element by the decoded SVG string, with BeautifulSoup. So here's what I do:

from bs4 import BeautifulSoup
import base64

with open("file.html") as f:
    soup = BeautifulSoup(f, "html.parser")

# get all images
images = soup.find_all("img")

# try with the first one
# decode the SVG string from the src attribute
svg_str = base64.b64decode(images[0]["src"].split(",")[1]).decode()
# replace the tag with the string
images[0].replace_with(soup.new_tag(svg_str))

However, images[0] remains unchanged, although no error is returned. I've looked at examples in the Internet, but I can't figure out what I'm doing wrong.


Solution

  • The issue you're encountering is due to the way you're trying to replace the img tag with the decoded SVG string. The soup.new_tag method is used to create new tags, but you're passing a string to it, which is not the correct usage. Instead, you should directly replace the img tag with the decoded SVG content.

    Here's how you can achieve this:

    1. Decode the base64 string.
    2. Parse the decoded SVG string into a BeautifulSoup object.
    3. Replace the img tag with the parsed SVG content.

    Here's the corrected code:

    from bs4 import BeautifulSoup
    import base64
    
    with open("file.html") as f:
        soup = BeautifulSoup(f, "html.parser")
    
    # get all images
    images = soup.find_all("img")
    
    # process each image
    for img in images:
        # decode the SVG string from the src attribute
        svg_str = base64.b64decode(img["src"].split(",")[1]).decode()
        # parse the SVG string into a BeautifulSoup object
        svg_soup = BeautifulSoup(svg_str, "html.parser")
        # replace the img tag with the parsed SVG content
        img.replace_with(svg_soup)
    
    # Save the modified HTML to a new file
    with open("modified_file.html", "w") as f:
        f.write(str(soup))