Search code examples
itext

How to interpret iTextSharp TJ/TF result


I'm implementing an automation process with PowerShell using iTextSharp lib, to extract needed information about several PDF documents.

Based on this PDF content portion:

PDF content portion

It returns this result:

[(1)-1688.21(1)-492.975(0)-493.019(0)]TJ
[(5)-493.019(0)-17728.1(2)]TJ

I can extract the literal values with some regex manipulation but, only using this method the result is:

$line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""

1000
502

Of course, these results are not integral, and I need more specification on the reading/parsing. I'm suspecting that the numbers between the literal characters (e.g -1688.21,-492.975,...), may be useful, but I didnt find explanation about such parameters.

What they represent?


Solution

  • When you are wondering about details of the PDF format, you should have a look into the PDF specification ISO 32000.

    Operands Operator Description
    array TJ Show one or more text strings, allowing individual glyph positioning. Each element of array shall be either a string or a number. If the element is a string, this operator shall show the string. If it is a number, the operator shall adjust the text position by that amount; that is, it shall translate the text matrix, Tm. The number shall be expressed in thousandths of a unit of text space (see 9.4.4, "Text Space Details"). This amount shall be subtracted from the current horizontal or vertical coordinate, depending on the writing mode. In the default coordinate system, a positive adjustment has the effect of moving the next glyph painted either to the left or down by the given amount. Figure 46 shows an example of the effect of passing offsets to TJ.

    (ISO 32000-1, Table 109 – Text-showing operators)

    Thus,

    I'm suspecting that the numbers between the literal characters (e.g -1688.21,-492.975,...), may be useful, but I didnt find explanation about such parameters.

    What they represent?

    For each such number, the operator adjusts the text position by that amount. The number is expressed in thousandths of a unit of text space. This amount is subtracted from the current horizontal or vertical coordinate, depending on the writing mode.