Search code examples
c#paginationwebbrowser-controlgdi+

Convert html to image with pagination using C#


I'm working on a windows service in c# 4.0 which transform different file in image (tif and jpeg)

I have a problem when I want to convert a html file (usually an e-mail) in image.

I use WebBrowser

var browser = new WebBrowser();
browser.DocumentCompleted += this.BrowserDocumentCompleted;
browser.DocumentText = html;

and DrawToBitmap

var browser = sender as WebBrowser;
Rectangle body = new Rectangle(browser.Document.Body.ScrollRectangle.X * scaleFactor,
    browser.Document.Body.ScrollRectangle.Y * scaleFactor,
    browser.Document.Body.ScrollRectangle.Width * scaleFactor,
    browser.Document.Body.ScrollRectangle.Height * scaleFactor);

browser.Height = body.Height;
Bitmap output = new Bitmap(body.Width, body.Height);
browser.DrawToBitmap(output, body);

It works fine for small or medium html, but with long html (like 22 000 height px or more) I have GDI exceptions on DrawToBitmap :

  • Invalid parameter

  • Not an image GDI+ valid

According to internet, this kind of error append because the image is too big.

My question: How can I convert html in X images (pagination) without generating the big image and cropping after, and if it's possible without using library?

Thank you in advance.

Edit: I found a tricky solution: surround the html with a div which is going to set the page and another for the offset, for example:

<div style="height:3000px; overflow:hidden"> 
<div style="margin-top:-3000px">

But this solution can crop on a line of text or in the middle of an image...


Solution

  • Thank you for your anwser Noseratio.

    I founded a solution by using printing and a virtual printer to get image file.

    Save the html in a file and remove all encoding :

    html = Regex.Replace(html, "<meta[^>]*http-equiv=\"Content-Type\"[^>]*>", string.Empty, RegexOptions.Multiline);
    using (var f = File.Create(filePath))
    {
       var bytes = Encoding.Default.GetBytes(html);
       f.Write(bytes, 0, bytes.Length);
    }
    

    Run the print without show the webbrowser and printing popup :

    const short PRINT_WAITFORCOMPLETION = 2;
    const int OLECMDID_PRINT = 6;
    const int OLECMDEXECOPT_DONTPROMPTUSER = 2;
    
    dynamic ie = browser.ActiveXInstance;
    ie.ExecWB(OLECMDID_PRINT, OLECMDEXECOPT_DONTPROMPTUSER, PRINT_WAITFORCOMPLETION);
    

    I use PDFCreator for virtual printing and it keep me all files in a folder. It's not easy to get all of this file (know when printing is finish, how many files and when you can use them...) but it isn't the purpose of this post!