I have been researching for a couple of hours on how to do this but have hit a brick wall. I have a PDF file and one of the objects is a North arrow. It is a simple line graphic (I believe they are called Graphic Markups in Acrobat) that will denote which way is "up". I want to read that line graphic and determine its rotation. First step I took is to see if I could enumerate the contents of the PDF with this code:
Imports it = iTextSharp.text
Imports ip = iTextSharp.text.pdf
Dim pdfRdr As New ip.PdfReader("C:\city.pdf")
Dim page As ip.PdfDictionary = pdfRdr.GetPageN(1)
Dim objectReference As ip.PdfIndirectReference = CType(page.Get(ip.PdfName.CONTENTS), ip.PdfIndirectReference)
Dim stream As ip.PRStream = CType(ip.PdfReader.GetPdfObject(objectReference), ip.PRStream)
Dim streamBytes() As Byte = ip.PdfReader.GetStreamBytes(stream)
Dim tokenizer As New ip.PRTokeniser(New ip.RandomAccessFileOrArray(streamBytes))
'Loop through each PDf token
While tokenizer.NextToken
Debug.Print("token of type={0} and value={1}", tokenizer.TokenType.ToString, tokenizer.StringValue)
End While
I do get some data back but am afraid I just don't understand how to decipher it.
token of type=OTHER and value=q
token of type=NUMBER and value=0.86275
token of type=NUMBER and value=0
token of type=NUMBER and value=0
token of type=NUMBER and value=0.86275
token of type=NUMBER and value=54
token of type=NUMBER and value=30
token of type=OTHER and value=cm
token of type=NAME and value=Fm0
token of type=OTHER and value=Do
token of type=OTHER and value=Q
token of type=OTHER and value=q
token of type=NUMBER and value=1
token of type=NUMBER and value=0
token of type=NUMBER and value=0
token of type=NUMBER and value=1
token of type=NUMBER and value=54
token of type=NUMBER and value=18
token of type=OTHER and value=cm
token of type=NAME and value=Fm1
token of type=OTHER and value=Do
token of type=OTHER and value=Q
I have skinnied down the PDF to show only the graphic that I am interested in.
test file is here https://drive.google.com/file/d/1dYFkvLMvznsx6sN-1GsNZVIBtDpgzwCU/view?usp=sharing
Am I going down the right path or is there a different way to get a reference to a graphic markup?
In contrast to the initial impression, the north arrow is not in an annotation of the PDF but instead part of the regular page content. (@Jon created his answer under that initial impression.)
In the PDF shared by the OP, the arrow is part of the immediate page content. In the Adobe Acrobat screenshot shared by the OP, on the other hand, the arrow appears to be in a form XObject (which in turn would be referenced from the immediate page content).
The following approach should retrieve the vector graphics instructions for either case.
You can retrieve the vector graphics instructions drawing the arrow using the iText parser framework.
Using a current iText 5.5.x, for example, you need to implement IExtRenderListener
and use that implementation in a PdfReaderContentParser
execution, e.g.:
Public Class VectorParser
Implements IExtRenderListener
Public Sub ModifyPath(renderInfo As PathConstructionRenderInfo) Implements IExtRenderListener.ModifyPath
pathInfos.Add(renderInfo)
End Sub
Public Function RenderPath(renderInfo As PathPaintingRenderInfo) As parser.Path Implements IExtRenderListener.RenderPath
Dim GraphicsState As GraphicsState = getGraphicsState(renderInfo)
Dim ctm As Matrix = GraphicsState.GetCtm()
If (Not (renderInfo.Operation And PathPaintingRenderInfo.FILL) = 0) Then
Console.Write("FILL ({0}) ", ToString(GraphicsState.FillColor))
If (Not (renderInfo.Operation And PathPaintingRenderInfo.STROKE) = 0) Then
Console.Write("and ")
End If
End If
If (Not (renderInfo.Operation And PathPaintingRenderInfo.STROKE) = 0) Then
Console.Write("STROKE ({0}) ", ToString(GraphicsState.StrokeColor))
End If
Console.Write("the path ")
For Each pathConstructionRenderInfo In pathInfos
Select Case pathConstructionRenderInfo.Operation
Case PathConstructionRenderInfo.MOVETO
Console.Write("move to {0} ", ToString(transform(ctm, pathConstructionRenderInfo.SegmentData)))
Case PathConstructionRenderInfo.CLOSE
Console.Write("close {0} ", ToString(transform(ctm, pathConstructionRenderInfo.SegmentData)))
Case PathConstructionRenderInfo.CURVE_123
Console.Write("curve123 {0} ", ToString(transform(ctm, pathConstructionRenderInfo.SegmentData)))
Case PathConstructionRenderInfo.CURVE_13
Console.Write("curve13 {0} ", ToString(transform(ctm, pathConstructionRenderInfo.SegmentData)))
Case PathConstructionRenderInfo.CURVE_23
Console.Write("curve23 {0} ", ToString(transform(ctm, pathConstructionRenderInfo.SegmentData)))
Case PathConstructionRenderInfo.LINETO
Console.Write("line to {0} ", ToString(transform(ctm, pathConstructionRenderInfo.SegmentData)))
Case PathConstructionRenderInfo.RECT
Console.Write("rectangle {0} ", ToString(transform(ctm, expandRectangleCoordinates(pathConstructionRenderInfo.SegmentData))))
End Select
Next
Console.WriteLine()
pathInfos.Clear()
Return Nothing
End Function
Public Sub ClipPath(rule As Integer) Implements IExtRenderListener.ClipPath
End Sub
Public Sub BeginTextBlock() Implements IRenderListener.BeginTextBlock
End Sub
Public Sub RenderText(renderInfo As TextRenderInfo) Implements IRenderListener.RenderText
End Sub
Public Sub EndTextBlock() Implements IRenderListener.EndTextBlock
End Sub
Public Sub RenderImage(renderInfo As ImageRenderInfo) Implements IRenderListener.RenderImage
End Sub
Function expandRectangleCoordinates(rectangle As IList(Of Single)) As List(Of Single)
If rectangle.Count < 4 Then
Return New List(Of Single)
End If
Return New List(Of Single)() From
{
rectangle(0), rectangle(1),
rectangle(0) + rectangle(2), rectangle(1),
rectangle(0) + rectangle(2), rectangle(1) + rectangle(3),
rectangle(0), rectangle(1) + rectangle(3)
}
End Function
Function transform(ctm As Matrix, coordinates As IList(Of Single)) As List(Of Single)
Dim result As List(Of Single) = New List(Of Single)
If Not coordinates Is Nothing Then
For i = 0 To coordinates.Count - 1 Step 2
Dim vector As Vector = New Vector(coordinates(i), coordinates(i + 1), 1)
vector = vector.Cross(ctm)
result.Add(vector(Vector.I1))
result.Add(vector(Vector.I2))
Next
End If
Return result
End Function
Public Function ToString(coordinates As IList(Of Single)) As String
Dim result As StringBuilder = New StringBuilder()
result.Append("[ ")
For i = 0 To coordinates.Count - 1
result.Append(coordinates(i))
result.Append(" ")
Next
result.Append("]")
Return result.ToString()
End Function
Public Function ToString(baseColor As BaseColor) As String
If (baseColor Is Nothing) Then
Return "DEFAULT"
End If
Return String.Format("{0},{1},{2}", baseColor.R, baseColor.G, baseColor.B)
End Function
Function getGraphicsState(renderInfo As PathPaintingRenderInfo) As GraphicsState
Dim gsField As Reflection.FieldInfo = GetType(PathPaintingRenderInfo).GetField("gs", Reflection.BindingFlags.NonPublic Or Reflection.BindingFlags.Instance)
Return CType(gsField.GetValue(renderInfo), GraphicsState)
End Function
Dim pathInfos As List(Of PathConstructionRenderInfo) = New List(Of PathConstructionRenderInfo)
End Class
which used like this
Using pdfReader As New PdfReader("test.pdf")
Dim extRenderListener As IExtRenderListener = New VectorParser
For page = 1 To pdfReader.NumberOfPages
Console.Write(vbCrLf + "Page {0}" + vbCrLf + "====" + vbCrLf, page)
Dim parser As PdfReaderContentParser = New PdfReaderContentParser(pdfReader)
parser.ProcessContent(page, extRenderListener)
Next
End Using
for your shared document returns
Page 1
====
STROKE (0,0,255) the path move to [ 277,359 434,2797 ] line to [ 311,5242 434,2797 ]
STROKE (0,0,255) the path move to [ 277,3591 434,2797 ] line to [ 315,0443 424,1336 ]
STROKE (0,0,255) the path move to [ 304,2772 425,376 ] line to [ 304,4842 426,6183 ]
STROKE (0,0,255) the path move to [ 304,6913 426,2042 ] line to [ 310,075 425,376 ]
STROKE (0,0,255) the path move to [ 304,6913 426,8254 ] line to [ 307,5902 425,9972 ]
FILL (0,0,255) the path move to [ 303,656 425,3759 ] line to [ 303,656 425,3759 ] line to [ 306,1407 425,1689 ] line to [ 306,1407 425,1689 ]
STROKE (0,0,255) the path move to [ 303,656 425,376 ] line to [ 303,656 425,376 ] line to [ 306,1407 425,1689 ] line to [ 306,1407 425,1689 ] close [ ]
FILL (0,0,255) the path move to [ 306,969 424,9618 ] line to [ 306,969 424,9618 ] line to [ 309,4538 424,7548 ] line to [ 309,4538 424,7548 ]
STROKE (0,0,255) the path move to [ 306,969 424,9619 ] line to [ 306,969 424,9619 ] line to [ 309,4538 424,7548 ] line to [ 309,4538 424,7548 ] close [ ]
FILL (0,0,255) the path move to [ 309,8679 424,9618 ] line to [ 309,8679 424,9618 ] line to [ 312,3527 424,5477 ] line to [ 312,3527 424,5477 ]
STROKE (0,0,255) the path move to [ 309,868 424,9619 ] line to [ 309,868 424,9619 ] line to [ 312,3527 424,5477 ] line to [ 312,3527 424,5477 ] close [ ]
STROKE (0,0,255) the path move to [ 313,1809 424,3407 ] line to [ 314,8374 424,1336 ]
STROKE (0,0,255) the path move to [ 304,2772 425,7901 ] line to [ 309,8679 424,9619 ] line to [ 312,9738 424,7548 ]
STROKE (0,0,255) the path move to [ 304,2772 425,9972 ] line to [ 309,8679 425,1689 ] line to [ 311,5244 424,9619 ]
STROKE (0,0,255) the path move to [ 304,6914 426,8254 ] line to [ 315,0445 424,1336 ]
STROKE (0,0,255) the path move to [ 311,7315 435,7292 ] line to [ 311,7315 432,8303 ]
STROKE (0,0,255) the path move to [ 321,2564 434,2797 ] line to [ 315,4587 434,2797 ]
STROKE (0,0,255) the path move to [ 315,4586 434,2797 ] line to [ 311,7315 434,2797 ]
STROKE (0,0,255) the path move to [ 311,7315 434,6938 ] line to [ 317,7363 434,0727 ] line to [ 311,7315 433,6585 ]
STROKE (0,0,255) the path move to [ 311,7315 434,4868 ] line to [ 314,8374 434,2797 ] line to [ 311,7315 434,2797 ]
STROKE (0,0,255) the path move to [ 310,6963 436,1433 ] line to [ 317,3222 434,9009 ] line to [ 322,2917 434,2797 ] line to [ 317,3222 433,6585 ] line to [ 310,6963 432,6232 ]
STROKE (0,0,255) the path move to [ 311,7315 435,5221 ] line to [ 317,3222 434,6938 ] line to [ 321,0493 434,2797 ] line to [ 317,3222 433,8656 ] line to [ 311,7315 433,0374 ]
STROKE (0,0,255) the path move to [ 311,7315 435,108 ] line to [ 317,3222 434,4868 ] line to [ 319,3928 434,2797 ] line to [ 317,3222 434,2797 ] line to [ 311,7315 433,4515 ]
This looks like a lot of instructions for a simple arrow, but zooming into the PDF one sees that the arrow indeed is constructed of numerous small lines:
In particular the arrow heads look like someone created them by hand using line segments of different lengths and widths.
The code above essentially is a port of the anonymous ExtRenderListener
implementation for Java and iText 5.5.x in this answer.
It is equally simple to implement this using iText 7.
As an aside: Unfortunately the instructions for drawing the arrow are not specifically marked; if there are other vector graphics on the same page, you'll have to filter the results returned by the parser by some specific criteria, e.g. the color (in the case at hand pure RGB blue) or the approximate coordinate range (e.g. inside a given x and y coordinate range only).