I'd like to test that some (de)serialization code is 100% symmetrical but I've found that, when serializing my data as a nested Tag, the inner Tags don't have newline characters between them like the original Tag did.
Is there some way to force newline characters between nested Tags without having to prettify -> re-parse?
Example:
import bs4
from bs4 import BeautifulSoup
tag_string = "<OUTER>\n<INNER/>\n</OUTER>"
tag = BeautifulSoup(tag_string, "xml").find("OUTER")
print(tag)
... <OUTER>
... <INNER/>
... </OUTER>
new_tag = bs4.Tag(name="OUTER")
new_tag.append(bs4.Tag(name="INNER", can_be_empty_element=True))
tag == new_tag
... False
print(new_tag)
... <OUTER><INNER/></OUTER>
In order to get a 100% symmetrical serialization, I have to:
OUTER
Tagnew_tag = BeautifulSoup(
new_tag.prettify().replace("\n ", "\n"), "xml"
).find("OUTER")
tag == new_tag
... True
print(new_tag)
... <OUTER>
... <INNER/>
... </OUTER>
Justification for this need:
BeautifulSoup has a NavigableString element which can be used to represent an XML structure with newline characters in it without having to convert to a string and parse back into a bs4.element.Tag
.
Example:
import bs4
from bs4 import BeautifulSoup
tag_string = "<OUTER>\n<INNER/>\n</OUTER>"
tag = BeautifulSoup(tag_string, "xml").find("OUTER")
new_tag = bs4.Tag(name="OUTER")
new_tag.append(bs4.NavigableString("\n"))
new_tag.append(bs4.Tag(name="INNER", can_be_empty_element=True))
new_tag.append(bs4.NavigableString("\n"))
tag == new_tag
... True