Search code examples
perlpdfunicodefontstruetype

How to tell whether a particular font includes a particular character in PDF::API2


I use PDF::API2 in my Perl application to embed OCR output behind the corresponding image, allowing the resulting PDF to be searched, as the OCR output can be extracted with pdftotext.

At the moment, as soon as the application sees a non-ASCII character in the OCR output, it switches from PDF core fonts to TTF. However, this is really hacky, as the core fonts include most Western European characters. TTF is only necessary for Greek, Russian, Japanese, etc.

How can I tell whether a particular font includes a particular character (including the CMAP table so that extraction with pdftotext works)?


Solution

  • Have you tried the glyph-specific methods?

    http://search.cpan.org/dist/PDF-API2/lib/PDF/API2/Resource/BaseFont.pm#GLYPH_RELATED_METHODS

    Failing that, perhaps rendering the glyph (to a separate document) and measuring it?