I am making a tool to translate the string for .epub file. I tried using HtmlAgilityPack as solution processing XHTML file that unpacked from epub file.
Here is a problem that HtmlAgilityPack automatically remove slash from tag which without end-tag.
I have tried do some research, But not enough to help me solve the problem.
For example: It was originally comes with slash at the end.
<link href="style.css" rel="stylesheet" type="text/css" />
But since it loaded into HtmlAgilityPack, The slash is automatically removed.
<link href="style.css" rel="stylesheet" type="text/css">
I know both of situation are valid in normal html pages, but seems not the same in epub format. EPUBcheck always show me fatal is slash removed. And it can't even being read if I just ignore the warning.
I have spending hours fixing the problem, Can someone give me a hand?
Thanks.
Set the OptionWriteEmptyNodes property to true on your HtmlDocument.
string htmltext =File.ReadAllText("test.html");
HtmlDocument doc = new HtmlDocument();
doc.OptionWriteEmptyNodes = true;
doc.LoadHtml(htmltext);