Search code examples
c#html-agility-pack

HtmlDocument.Save (HtmlAgilityPack) outputs incomplete document


We are using HtmlAgilityPack to save HTML ... the output is being trimmed, do not understand why.

The code we are using to create the export:

var doc = new HtmlDocument();

string html = "<head>";

html += "<title>Page Title</title>";      
html += "<style>" + style + "</style>";
html += "</head><body>";
html += body; // string is not very long
html += "<script>" + js + "</script>";   
html += "</body>";

FileStream sw = new FileStream(html_file, FileMode.Create);
doc.LoadHtml(html);
doc.Save(sw);
sw.Close();

The exported file body is trimmed. What are we doing wrong?

The full string is pretty small and straight forward, it contains no scripts, special characters, none of that sort... the export is trimmed in the middle of the "Additional Charges" title at the second partial right after the title...

<div class="page-body">
                    <div class="top-title">1.Bill Summary <small style="font-size:14px;">1/2</small></div>
                    <div class="title" string="Device">
                        Period And Contract Information
                    </div>
                    <table class="partial">
                        <tr><td class="property">Maximum Half Hourly Demand:</td><td class="value">47,000 KWh</td></tr>
                        <tr><td class="property">Minimum Monthly Load Factor:</td><td class="value">57.2%</td></tr>
                        <tr><td class="property">Actual Maximum Demand:</td><td class="value">40,843 KWh</td></tr>
                        <tr><td class="property">Actual Load Factor:</td><td class="value">69.2%</td></tr>
                        <tr><td class="property">Period-to-date availability</td><td class="value">95.8%</td></tr>
                        <tr><td class="property">Contract Discount</td><td class="value">0.00%</td></tr>
                        <tr><td class="property">Contract Discount - Peak</td><td class="value">0.00%</td></tr>
                        <tr><td class="property">Contract Discount - Shoulder</td><td class="value">0.00%</td></tr>
                        <tr><td class="property">Contract Discount - Off Peak</td><td class="value">0.00%</td></tr>
                    </table>
                    <div class="title">
                        Bill Summary
                    </div>
                    <table class="partial">
                        <tr><td class="property">Energy Consumption</td><td class="value">7,072,662.46 ILS</td></tr>
                        <tr><td class="property">Fixed Fee to BB</td><td class="value">5,698.48 ILS</td></tr>
                        <tr><td class="property">Power Factor Fee to BB</td><td class="value"></td></tr>
                        <tr><td class="property">Other Fees to BB</td><td class="value"></td></tr>
                        <tr><td class="property">Min. Monthly Quantity charge</td><td class="value">66,791,095.60 ILS</td></tr>
                        <tr><td class="property">Additional Charges</td><td class="value">0.00 ILS</td></tr>
                        <tr><td class="property">Interest on Arrears</td><td class="value">0.00 ILS</td></tr>
                    </table>
                    <div class="title total">
                        <span style="display: inline-block;width: 280px;">Total Bill</span><b>7,078</b>
                    </div>
                    <table class="partial">
                        <tr><td class="property">Monthly Discount</td><td class="value">371</td></tr>
                        <tr><td class="property">Bill For Energy</td><td class="value">7,444</td></tr>
                    </table>
                </div>

Solution

  • Not sure which versions of .NET/HtmlAgilityPack you are using. I was able to reproduce it on .NET 4.0/HtmlAgilityPack 1.3.0.0 but not sure if these are correct versions.

    Anyways, it looks to be some sort of HtmlAgilityPack bug with creating StreamWriter without setting AutoFlush to true. Thus it closes the stream writer without flushing it.

    Good thing is that you can pass it your own StreamWriter instead of Stream.

    Your code adjusted based on the results I got:

    var doc = new HtmlDocument();
    
    string html = "<head>";
    
    html += "<title>Page Title</title>";      
    html += "<style>" + style + "</style>";
    html += "</head><body>";
    html += body; // string is not very long
    html += "<script>" + js + "</script>";   
    html += "</body>";
    
    doc.LoadHtml(html);
    using(FileStream fs = new FileStream(html_file, FileMode.Create))
    using (StreamWriter sw = new StreamWriter(fs, Encoding.UTF8) { AutoFlush = true }) {
        doc.Save(sw);
        // You don't need to Close the stream by yourself, Dispose() will do the work
        // sw.Close();
    }
    

    As a note, that I couldn't reproduce it on latest versions of .NET/HtmlAgilityPack.