Search code examples
itext7

iText7 doesn't accept special html characters inside @page rule


I'm following the iText7 tutorial to convert some HTML to PDF with footers. It seems CssRuleSetParser breaks because of the semicolon after &aacute:

<html>
  <head>
    <style>
      @page {
        @bottom-right {
          content: "P&aacute;gina " counter(page) " de " counter(pages);
        }
      }
    </style>
  </head>
  <body>
    <p>Minha terra tem palmeiras<br/>Onde canta o sabi&aacute;<br/>As aves que aqui gorjeiam<br/>N&atilde;o gorjeiam como l&aacute;</p>
  </body>
</html>

Special characters in the body work flawlessly.

The Java code has nothing special:

import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.springframework.http.MediaType;
import org.springframework.web.bind.annotation.PostMapping;
import org.springframework.web.bind.annotation.RequestBody;
import org.springframework.web.bind.annotation.RequestMapping;
import org.springframework.web.bind.annotation.ResponseBody;
import org.springframework.web.bind.annotation.RestController;

import com.itextpdf.html2pdf.ConverterProperties;
import com.itextpdf.html2pdf.HtmlConverter;
import com.itextpdf.styledxmlparser.css.media.MediaDeviceDescription;
import com.itextpdf.styledxmlparser.jsoup.nodes.Document;

@RestController
@RequestMapping("/html2pdf")
public class Html2PdfController {

  @PostMapping(produces = MediaType.APPLICATION_PDF_VALUE)
  public @ResponseBody byte[] convert(@RequestBody String html) throws IOException {
    try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
      var converterProperties = new ConverterProperties(); 
      var mediaDeviceDescription = new MediaDeviceDescription(com.itextpdf.styledxmlparser.css.media.MediaType.PRINT);
      converterProperties.setMediaDeviceDescription(mediaDeviceDescription);
      HtmlConverter.convertToPdf(html, baos, converterProperties);
      return baos.toByteArray();
    }
  }
}

What's is the best approach here? Should I try to "sanitize" special characters in CSS? :-(

Edit:

[forward|back]Slashes are also not accepted. They are just being ignored:

@bottom-right {
  content: counter(page) " / " counter(pages);
}

Prints (for example)

1  8

Solution

  • You can encode your HTML in UTF-8 (don't forget to use <meta> tag to tell browsers/iText about it) and use your non-ASCII characters directly.

    Example of your source file adapted as per what is suggested above:

    <html>
    <head>
      <meta charset="UTF-8">
      <style>
        @page {
          @bottom-right {
            content: "Página " counter(page) " / " counter(pages);
          }
        }
      </style>
    </head>
    <body>
    <p>Minha terra tem palmeiras<br/>Onde canta o sabiá<br/>As aves que aqui gorjeiam<br/>Não gorjeiam como lá</p>
    </body>
    </html>
    

    Slashes also work just fine, at least in my latest pdfHTML 4.0.1. Here is the visual result:

    result