Hi I am trying to parse html code I am attaching a few line of html
<link rel="stylesheet" href="assets/css/fontawesome-min.css">
<link rel="stylesheet" href="assets/css/bootstrap.min.css">
<link rel="stylesheet" href="assets/css/xsIcon.css">
When I load this into beautifulsoup it changes attributes position in alphabetic order like code below
<link href="assets/css/fontawesome-min.css" rel="stylesheet"/>
<link href="assets/css/bootstrap.min.css" rel="stylesheet"/>
<link href="assets/css/xsIcon.css" rel="stylesheet"/>
You can see difference initially rel was before href after just loading and write file again order of attributes changes.
Is there any way to prevent this from happening. Thanks
From the documentation, you can use custom HTMLFormatter
:
from bs4 import BeautifulSoup
from bs4.formatter import HTMLFormatter
txt = '''<link rel="stylesheet" href="assets/css/fontawesome-min.css">
<link rel="stylesheet" href="assets/css/bootstrap.min.css">
<link rel="stylesheet" href="assets/css/xsIcon.css">'''
class UnsortedAttributes(HTMLFormatter):
def attributes(self, tag):
for k, v in tag.attrs.items():
yield k, v
soup = BeautifulSoup(txt, 'html.parser')
#before HTMLFormatter
print( soup )
print('-' * 80)
#after HTMLFormatter
print( soup.encode(formatter=UnsortedAttributes()).decode('utf-8') )
Prints:
<link href="assets/css/fontawesome-min.css" rel="stylesheet"/>
<link href="assets/css/bootstrap.min.css" rel="stylesheet"/>
<link href="assets/css/xsIcon.css" rel="stylesheet"/>
--------------------------------------------------------------------------------
<link rel="stylesheet" href="assets/css/fontawesome-min.css"/>
<link rel="stylesheet" href="assets/css/bootstrap.min.css"/>
<link rel="stylesheet" href="assets/css/xsIcon.css"/>