I'm working on a project where I need to test some html output. To do this, I am using JSoup to extract the elements I want to test against and running assertions on the output. The problem I'm running into is that JSoup 'cleans' the output before returning it, so my outputs do not match my inputs, even if the original html was correct. All the suggestions I have run across suggest disabling pretty print via the output settings. Unfortunately, so far that solution has not worked. I'm not sure whether I am simply not disabling the output formatting correctly or if there is something else going on. Any suggestions would be appreciated.
@Test
public void testJsoupParse()
{
Document testDoc = Jsoup.parse("<html> <span id='sp1'><strong>ABC 123</strong></span> <span id='sp2'>XYZ 098</span> </html>");
testDoc.outputSettings().prettyPrint(false);
String sp1 = testDoc.select("span#sp1").text();
System.out.println(sp1);
String spHtml = testDoc.select("span#sp1").html();
System.out.println(spHtml);
//this should pass, but fails due to the extra space being stripped out
assertThat(sp1).isEqualTo("ABC 123");
//this will also fail since .html() will include the <strong> tags in the output
assertThat(spHtml).isEqualTo("ABC 123");
//this will pass
assertThat(testDoc.select("span#sp2").text()).isEqualTo("XYZ 098");
}
Use wholeText
instead of text
String sp1 = testDoc.getElementById("sp1").wholeText();
assertThat(sp1).isEqualTo("ABC 123");
Side note: For your simple sample html it may make no difference but your select is not correct. Instead of
String sp1 = testDoc.select("span#sp1").text();
use selectFirst
or getElementById
since you want to get an Element
. Select returns Elements
(i.e a list of elements). So something like
String sp1 = testDoc.selectFirst("span#sp1").wholeText();
String sp1 = testDoc.getElementById("sp1").wholeText();