Search code examples
pdffontspostscript

The 14 standard PDF fonts and character encoding


I'm having difficulty producing PDFs that make use of the 14 standard PDF fonts. Let's use Times-Roman as an example.

I create a Font dictionary of type Type1, with BaseFont set to Times-Roman. If I omit the Encoding entry to the Font dictionary, or add an Encoding dictionary without a BaseEncoding set, the PDF viewer application should use the font's built-in encoding. For Times-Roman, this is AdobeStandardEncoding.

This works fine for ASCII characters. However, something more exotic like the 'fi' ligature (AdobeStandardEncoding code 174) is not displayed correctly by all PDF viewers:

  • Adobe Reader shows ® (unicode index 174) for Times-Roman and Ă for Times-Italic
  • SumatraPDF (wine) shows ® for both fonts
  • Mozilla's PDF.js shows the 'AE' ligature both fonts

All other PDF viewers I've tried, display the 'fi' ligature properly. They also display the € symbol correctly, which is additionally mapped using the Differences array in the Encoding dictionary (because it is not included in AdobeStandardEncoding):

  • Apple Preview/Skim
  • GhostScript
  • PDF-XChange Viewer (wine)
  • Foxit Reader (wine)
  • Chromium's internal PDF viewer
  • Evince (homebrew)

Opening Adobe Reader's Document Properties window shows:

Times-Roman
    Type: Type1
    Encoding: Custom
    Actual Font: Times-Roman
    Actual Font Type: TrueType

I suspect the fact that a TrueType font is being used instead of a Type1 font might be related to the problem. The PDF specification:

StandardEncoding Adobe standard Latin-text encoding. This is the built-in encoding defined in Type 1 Latin-text font programs (but generally not in TrueType font programs).

It also says WinAnsiEncoding and MacRomanEncoding can be used with TrueType fonts. So should we avoid using the built-in or StandardEncoding for the standard 14 fonts? Its effects seem to be undefined. It seems Adobe Reader doesn't bother performing a proper mapping from glyph names to glyphs in the TrueType font being used.

Will providing a Differences array when using the Win or Mac encodings produce proper results? Since these map codepoints to Type1/Postscript glyphs names, there is no direct link to TrueType glyphs.

EDIT Mmm, I have a feeling the Font Descriptor Flags might be important for these standard fonts. I set the flags to 4 up to now for all fonts, which seemed to work fine for True/OpenType fonts.


Solution

  • Turns out the Flags in the FontDescriptor dictionary is important. For Times, the Nonsymbolic flag (bit 6) needs to be set. The fact that Times is actually being typeset using a TrueType font has nothing to do with it.

    To use the built-in encoding of the font, the Encoding entry of the Type1 Font dictionary should not be set. You may only add an Encoding dictionary (with BaseEncoding omitted) if it contains a non-empty Differences array, or Adobe Reader will error.

    With these precautions, the generated PDF displays correctly on all 9 viewer applications listed above.