BeautifulSoup shuffles the attributes of html tags

I have an issue with BeautifukSoup. Whenever I parse an HTML input, it changes the order of the attributes (e.g. class, id) of the HTML tags.

For example:

from bs4 import BeautifulSoup

tags = BeautifulSoup('<span id="100" class="test"></span>', "html.parser")
print(str(tags))

Prints:

<span class="test" id="100"></span>

As you can see, the class and id order was changed. How can I prevent such behavior?

I am unfamiliar with web development, but I know that the order of the attributes doesn't matter.

My main goal here is to preserve the original shape of the HTML input after parsing it because I want to loop through the tags and match them (at character-level) with other HTML texts.

Solution

As you stated, the order of attributes in HTML doesn't matter. But if you really want unsorted attributes, you can do:

from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter


class UnsortedAttributes(HTMLFormatter):
    def attributes(self, tag):
        yield from tag.attrs.items()


tags = BeautifulSoup('<span id="100" class="test"></span>', "html.parser")

print(tags.encode(formatter=UnsortedAttributes()).decode())

Prints:

<span id="100" class="test"></span>

EDIT: To not close void tags you can try:

class UnsortedAttributes(HTMLFormatter):
    def __init__(self):
        super().__init__(
            void_element_close_prefix=""
        )  # <-- use void_element_close_prefix="" here

    def attributes(self, tag):
        yield from tag.attrs.items()


tags = BeautifulSoup(
    """<input id="NOT_CLOSED_TAG" type="Button">""",
    "html.parser",
)

print(tags.encode(formatter=UnsortedAttributes()).decode())

Prints:

<input id="NOT_CLOSED_TAG" type="Button">