Search code examples
c#pdfitextpdf-generationitext7

Generate one pdf document with multiple pages converting from html using IText 7


I'm working with IText 7, I've been able to get one html page and generate a pdf for that page, but I need to generate one pdf document from multiple html pages and separated by pages. For example: I have Page1.html, Page2.html and Page3.html. I will need a pdf document with 3 pages, the first page with the content of Page1.html, second page with the content of Page2.html and like that...

This is the code I have and it's working for one html page:

ConverterProperties properties = new ConverterProperties();              
PdfWriter writer = new PdfWriter(pdfRoot, new WriterProperties().SetFullCompressionMode(true));
PdfDocument pdfDocument = new PdfDocument(writer);
pdfDocument.AddEventHandler(PdfDocumentEvent.END_PAGE, new HeaderPdfEventHandler());
HtmlConverter.ConvertToPdf(htmlContent, pdfDocument, properties);

Is it possible to loop against the multiple html pages, add a new page to the PdfDocument for every html page and then have only one pdf generated with one page per html page?

UPDATE

I've been following this example and trying to translate it from Java to C#, I'm trying to use PdfMerger and loop around the html pages... but I'm receiving the Exception Cannot access a closed stream, on this line:

temp = new PdfDocument(
                    new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));

It looks like is related to the ByteArrayOutputStream baos instance. Any suggestions? This is my current code:

foreach (var html in htmlList)
{
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    PdfDocument temp = new PdfDocument(new PdfWriter(baos));
    HtmlConverter.ConvertToPdf(html, temp, properties);              
    ReaderProperties rp = new ReaderProperties();
    temp = new PdfDocument(
        new PdfReader(new RandomAccessSourceFactory().CreateSource(baos), rp));
    merger.Merge(temp, 1, temp.GetNumberOfPages());
    temp.Close();
}
pdfDocument.Close();

Solution

  • You are using RandomAccessSourceFactory and passing there a closed stream which you wrote a PDF document into. RandomAccessSourceFactory expects an input stream instead that is ready to be read.

    First of all you should use MemoryStream which is native to .NET world. ByteArrayOutputStream is the class that was ported from Java for internal purposes (although it extends MemoryStream as well). Secondly, you don't have to use RandomAccessSourceFactory - there is a simpler way.

    You can create a new MemoryStream instance from the bytes of the MemoryStream that you used to create a temporary PDF with the following line:

    baos = new MemoryStream(baos.ToArray());
    

    As an additional remark, it's better to close PdfMerger instance directly instead of closing the document - closing PdfMerger closes the underlying document as well.

    All in all, we get the following code that works:

    foreach (var html in htmlList)
    {
        MemoryStream baos = new MemoryStream();
        PdfDocument temp = new PdfDocument(new PdfWriter(baos));
        HtmlConverter.ConvertToPdf(html, temp, properties);              
        ReaderProperties rp = new ReaderProperties();
        baos = new MemoryStream(baos.ToArray());
        temp = new PdfDocument(new PdfReader(baos, rp));
        pdfMerger.Merge(temp, 1, temp.GetNumberOfPages());
        temp.Close();
    }
    pdfMerger.Close();