Search code examples
htmlcssgoogle-chromesafaripdf2htmlex

Font misalignment during pdf to html conversion using pdf2htmlEx tool


FONT ISSUES WITH PDF TO HTML CONVERSION

  1. All "ti","fi","tt" characters are missing

SAMPLE SCREENSHOT

  1. Font overlapping issue

SAMPLE SCREENSHOT

  • NOTE: I don't get this issue with firefox. Getting the above issues in chrome in safari browser

I AM USING

  • Using the 0.13.6 version of pdf2htmlEX
  • Using the following command to convert pdf to html

pdf2htmlEX --split-pages 1 --zoom 3 --fit-width 920 --correct-text-visibility 1 --dest-dir $1 $2 2>&1

TRIED

Using --fallback 1 option solves all my above problems. But

  1. The fallback option reduces the clarity of document.
  2. Table in the page disappears rather replaced with empty space.

DOUBTS

  1. Could you please explain a bit more on fallback?

  2. I have tried the above one (using fallback). Please suggest me if you prefer a different approach to solve the above problem with fonts.

Getting the above issues with chrome and safari whereas, in Firefox it is working fine.


Solution

  • The above issue occurs only in - webkit web browsers like chrome and safari - which provides support for ligatures - whereas browser like firefox does not.

    A ligature is a combination of two or more letters joined as a single glyph

    ​Root cause

    This issue with missing characters is due to ligature support provided by these modern browsers - let me explain how

    1.The tool while converting - it converts characters to glyphs using poppler for rendering - now these browser when they come across characters like tt tf ti ff fi consider them to be ligature and searches for glyphs corresponding to tt and not t t

    2.Since they do not have their corresponding glyphs - they just skip the characters and renders the rest - so, we fount the characters missing

    Could be solved by

    Disabling/ Turning-off the ligature in these browsers - embedding the css in the generating content

    For more details please refer:

    Please correct me if I am wrong.