html css google-chrome safari pdf2htmlex

Font misalignment during pdf to html conversion using pdf2htmlEx tool

FONT ISSUES WITH PDF TO HTML CONVERSION

All "ti","fi","tt" characters are missing

SAMPLE SCREENSHOT

Font overlapping issue

SAMPLE SCREENSHOT

NOTE: I don't get this issue with firefox. Getting the above issues in chrome in safari browser

I AM USING

Using the 0.13.6 version of pdf2htmlEX
Using the following command to convert pdf to html

pdf2htmlEX --split-pages 1 --zoom 3 --fit-width 920 --correct-text-visibility 1 --dest-dir $1 $2 2>&1

TRIED

Using --fallback 1 option solves all my above problems. But

The fallback option reduces the clarity of document.
Table in the page disappears rather replaced with empty space.

DOUBTS

Could you please explain a bit more on fallback?

I have tried the above one (using fallback). Please suggest me if you prefer a different approach to solve the above problem with fonts.

Getting the above issues with chrome and safari whereas, in Firefox it is working fine.

Solution

The above issue occurs only in - webkit web browsers like chrome and safari - which provides support for ligatures - whereas browser like firefox does not.

A ligature is a combination of two or more letters joined as a single glyph

Root cause

This issue with missing characters is due to ligature support provided by these modern browsers - let me explain how

1.The tool while converting - it converts characters to glyphs using poppler for rendering - now these browser when they come across characters like tt tf ti ff fi consider them to be ligature and searches for glyphs corresponding to tt and not t t

2.Since they do not have their corresponding glyphs - they just skip the characters and renders the rest - so, we fount the characters missing

Could be solved by

Disabling/ Turning-off the ligature in these browsers - embedding the css in the generating content

For more details please refer:

Prevent ligatures in Safari (Mavericks/iOS7) via CSS

Please correct me if I am wrong.