I want to extract the rectangles from pdf with their location and fill color. Please help me if anyone here has some idea about extraction of rectangle shapes from pdf using itext7 in vb.net
As it turned out in comments, the forms the OP is interested in are vector graphics but not only rectangles but essentially arbitrary shapes. Thus, this answer demonstrates how to extract vector graphics paths and their use (stroke/fill/...) using vb.net.
For data extraction from PDFs iText 7 provides a framework that follows the instructions in a PDF content stream and triggers events accordingly. To extract the paths, therefore, you have to first of all implement an event listener (implementing the iText IEventListener
interface). This implementation then needs to select only the required events (with EventType.RENDER_PATH
) and extract the desired information from the given PathRenderInfo
event data object.
The following example event listener class simply prints the path information to the console:
Public Class PathListener
Implements IEventListener
Public Sub EventOccurred(data As IEventData, type As EventType) Implements IEventListener.EventOccurred
If type = EventType.RENDER_PATH Then
Dim PathRenderInfo As PathRenderInfo = CType(data, PathRenderInfo)
Dim OperationData = GetOperationData(PathRenderInfo)
Dim PathData = GetPathData(PathRenderInfo)
Console.WriteLine("{1} - {0}", OperationData, PathData)
End If
End Sub
Public Function GetSupportedEvents() As ICollection(Of EventType) Implements IEventListener.GetSupportedEvents
Return Nothing
End Function
Function GetOperationData(PathRenderInfo As PathRenderInfo) As String
Dim OperationBuilder As New StringBuilder
If PathRenderInfo.GetOperation = PathRenderInfo.NO_OP Then
OperationBuilder.Append("Invisible")
End If
If (PathRenderInfo.GetOperation And PathRenderInfo.STROKE) = PathRenderInfo.STROKE Then
OperationBuilder.Append("Stroked with ").Append(GetColorData(PathRenderInfo.GetStrokeColor))
If (PathRenderInfo.GetOperation And PathRenderInfo.FILL) = PathRenderInfo.FILL Then
OperationBuilder.Append(" and ")
End If
End If
If (PathRenderInfo.GetOperation And PathRenderInfo.FILL) = PathRenderInfo.FILL Then
OperationBuilder.Append("Filled with ").Append(GetColorData(PathRenderInfo.GetFillColor))
End If
If (PathRenderInfo.IsPathModifiesClippingPath) Then
OperationBuilder.Append(", clipping")
End If
Return OperationBuilder.ToString
End Function
Function GetColorData(Color As Color) As String
Dim ColorBuilder As New StringBuilder
If TypeOf Color Is CalGray Then
ColorBuilder.Append("CalGray")
ElseIf TypeOf Color Is CalRgb Then
ColorBuilder.Append("CalRGB")
ElseIf TypeOf Color Is DeviceCmyk Then
ColorBuilder.Append("DeviceCmyk")
ElseIf TypeOf Color Is DeviceGray Then
ColorBuilder.Append("DeviceGray")
ElseIf TypeOf Color Is DeviceN Then
ColorBuilder.Append("DeviceN")
ElseIf TypeOf Color Is DeviceRgb Then
ColorBuilder.Append("DeviceRgb")
ElseIf TypeOf Color Is IccBased Then
ColorBuilder.Append("IccBased")
ElseIf TypeOf Color Is Indexed Then
ColorBuilder.Append("Indexed")
ElseIf TypeOf Color Is Lab Then
ColorBuilder.Append("Lab")
ElseIf TypeOf Color Is PatternColor Then
Return "PatternColor(special)"
ElseIf TypeOf Color Is Separation Then
ColorBuilder.Append("Separation")
End If
ColorBuilder.Append("(").Append(String.Join(", ", Color.GetColorValue)).Append(")")
Return ColorBuilder.ToString
End Function
Function GetPathData(PathRenderInfo As PathRenderInfo) As String
Dim CurrentTransformation = PathRenderInfo.GetCtm
Dim PathBuilder As New StringBuilder
Dim FirstSubPath = True
For Each SubPath In PathRenderInfo.GetPath.GetSubpaths
If FirstSubPath Then
FirstSubPath = False
PathBuilder.Append("Path ")
ElseIf Not (SubPath.IsEmpty Or SubPath.GetSegments.Count = 0) Then
PathBuilder.Append(" and ")
End If
Dim FirstShape = True
For Each Shape In SubPath.GetSegments
If FirstShape Then
FirstShape = False
PathBuilder.Append("from ").Append(GetPointData(Shape.GetBasePoints.First, CurrentTransformation))
Else
PathBuilder.Append(",")
End If
If TypeOf Shape Is Line Then
PathBuilder.Append(" line to ").Append(GetPointData(Shape.GetBasePoints.Last, CurrentTransformation))
ElseIf TypeOf Shape Is BezierCurve Then
PathBuilder.Append(" curve via ").Append(GetPointData(Shape.GetBasePoints(1), CurrentTransformation))
PathBuilder.Append(" and ").Append(GetPointData(Shape.GetBasePoints(2), CurrentTransformation))
PathBuilder.Append(" to ").Append(GetPointData(Shape.GetBasePoints(3), CurrentTransformation))
End If
Next
If SubPath.IsClosed Then
PathBuilder.Append(" (closed)")
End If
Next
Return PathBuilder.ToString
End Function
Function GetPointData(Point As Point, CurrentTransformation As Matrix) As String
Dim Transformed = CurrentTransformation.Multiply(New Matrix(Point.GetX, Point.GetY))
Return String.Format(CultureInfo.InvariantCulture, "({0}, {1})", Transformed.Get(Matrix.I31), Transformed.Get(Matrix.I32))
End Function
End Class
Using this event listener you can inspect the pages of your document:
Using PdfDocument As New PdfDocument(New PdfReader(...))
Dim PathListener As New PathListener
Dim PdfCanvasProcessor As New PdfCanvasProcessor(PathListener)
For page As Integer = 1 To PdfDocument.GetNumberOfPages
PdfCanvasProcessor.ProcessPageContent(PdfDocument.GetPage(page))
Next
End Using
The output then may look like this:
Path from (51.2, 723.57) line to (512.17, 723.57), line to (512.17, 736.97), line to (51.2, 736.97) (closed) - Invisible, clipping
Path from (108.6, 516.6) curve via (108.6, 569.29) and (160.18, 612) to (223.8, 612), curve via (287.42, 612) and (339, 569.29) to (339, 516.6), curve via (339, 463.91) and (287.42, 421.2) to (223.8, 421.2), curve via (160.18, 421.2) and (108.6, 463.91) to (108.6, 516.6) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (174.89, 545.13) curve via (174.89, 550.62) and (180.27, 555.07) to (186.89, 555.07), curve via (193.52, 555.07) and (198.89, 550.62) to (198.89, 545.13), curve via (198.89, 539.64) and (193.52, 535.19) to (186.89, 535.19), curve via (180.27, 535.19) and (174.89, 539.64) to (174.89, 545.13) (closed) and from (248.71, 545.13) curve via (248.71, 550.62) and (254.08, 555.07) to (260.71, 555.07), curve via (267.33, 555.07) and (272.71, 550.62) to (272.71, 545.13), curve via (272.71, 539.64) and (267.33, 535.19) to (260.71, 535.19), curve via (254.08, 535.19) and (248.71, 539.64) to (248.71, 545.13) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
Path from (174.89, 545.13) curve via (174.89, 550.62) and (180.27, 555.07) to (186.89, 555.07), curve via (193.52, 555.07) and (198.89, 550.62) to (198.89, 545.13), curve via (198.89, 539.64) and (193.52, 535.19) to (186.89, 535.19), curve via (180.27, 535.19) and (174.89, 539.64) to (174.89, 545.13) (closed) and from (248.71, 545.13) curve via (248.71, 550.62) and (254.08, 555.07) to (260.71, 555.07), curve via (267.33, 555.07) and (272.71, 550.62) to (272.71, 545.13), curve via (272.71, 539.64) and (267.33, 535.19) to (260.71, 535.19), curve via (254.08, 535.19) and (248.71, 539.64) to (248.71, 545.13) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (161.36, 475) curve via (202.99, 451.32) and (244.56, 451.32) to (286.09, 475) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (108.6, 516.6) curve via (108.6, 569.29) and (160.18, 612) to (223.8, 612), curve via (287.42, 612) and (339, 569.29) to (339, 516.6), curve via (339, 463.91) and (287.42, 421.2) to (223.8, 421.2), curve via (160.18, 421.2) and (108.6, 463.91) to (108.6, 516.6) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (51.2, 565.15) line to (512.17, 565.15), line to (512.17, 578.55), line to (51.2, 578.55) (closed) - Invisible, clipping
Path from (147.8, 556.2) curve via (147.8, 608.89) and (199.38, 651.6) to (263, 651.6), curve via (326.62, 651.6) and (378.2, 608.89) to (378.2, 556.2), curve via (378.2, 503.51) and (326.62, 460.8) to (263, 460.8), curve via (199.38, 460.8) and (147.8, 503.51) to (147.8, 556.2) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (214.09, 584.73) curve via (214.09, 590.22) and (219.47, 594.67) to (226.09, 594.67), curve via (232.72, 594.67) and (238.09, 590.22) to (238.09, 584.73), curve via (238.09, 579.24) and (232.72, 574.79) to (226.09, 574.79), curve via (219.47, 574.79) and (214.09, 579.24) to (214.09, 584.73) (closed) and from (287.91, 584.73) curve via (287.91, 590.22) and (293.28, 594.67) to (299.91, 594.67), curve via (306.53, 594.67) and (311.91, 590.22) to (311.91, 584.73), curve via (311.91, 579.24) and (306.53, 574.79) to (299.91, 574.79), curve via (293.28, 574.79) and (287.91, 579.24) to (287.91, 584.73) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
Path from (214.09, 584.73) curve via (214.09, 590.22) and (219.47, 594.67) to (226.09, 594.67), curve via (232.72, 594.67) and (238.09, 590.22) to (238.09, 584.73), curve via (238.09, 579.24) and (232.72, 574.79) to (226.09, 574.79), curve via (219.47, 574.79) and (214.09, 579.24) to (214.09, 584.73) (closed) and from (287.91, 584.73) curve via (287.91, 590.22) and (293.28, 594.67) to (299.91, 594.67), curve via (306.53, 594.67) and (311.91, 590.22) to (311.91, 584.73), curve via (311.91, 579.24) and (306.53, 574.79) to (299.91, 574.79), curve via (293.28, 574.79) and (287.91, 579.24) to (287.91, 584.73) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (200.56, 514.6) curve via (242.19, 490.92) and (283.76, 490.92) to (325.29, 514.6) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (147.8, 556.2) curve via (147.8, 608.89) and (199.38, 651.6) to (263, 651.6), curve via (326.62, 651.6) and (378.2, 608.89) to (378.2, 556.2), curve via (378.2, 503.51) and (326.62, 460.8) to (263, 460.8), curve via (199.38, 460.8) and (147.8, 503.51) to (147.8, 556.2) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (103, 398.43) line to (487, 398.43), line to (487, 605.83), line to (103, 605.83) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (103, 398.43) line to (487, 398.43), line to (487, 605.83), line to (103, 605.83) (closed) - Stroked with DeviceGray(0.525)
Path from (229.4, 344.2) curve via (229.4, 363.31) and (246.59, 378.8) to (267.8, 378.8), curve via (289.01, 378.8) and (306.2, 363.31) to (306.2, 344.2), curve via (306.2, 325.09) and (289.01, 309.6) to (267.8, 309.6), curve via (246.59, 309.6) and (229.4, 325.09) to (229.4, 344.2) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (229.4, 344.2) curve via (229.4, 363.31) and (246.59, 378.8) to (267.8, 378.8), curve via (289.01, 378.8) and (306.2, 363.31) to (306.2, 344.2), curve via (306.2, 325.09) and (289.01, 309.6) to (267.8, 309.6), curve via (246.59, 309.6) and (229.4, 325.09) to (229.4, 344.2) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (237.4, 256.57) line to (266.74, 256.57), line to (275.8, 283), line to (284.86, 256.57), line to (314.2, 256.57), line to (290.47, 240.23), line to (299.53, 213.8), line to (275.8, 230.14), line to (252.07, 213.8), line to (261.13, 240.23) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (237.4, 256.57) line to (266.74, 256.57), line to (275.8, 283), line to (284.86, 256.57), line to (314.2, 256.57), line to (290.47, 240.23), line to (299.53, 213.8), line to (275.8, 230.14), line to (252.07, 213.8), line to (261.13, 240.23) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (163, 432.4) curve via (163, 485.09) and (216.9, 527.8) to (283.4, 527.8), curve via (349.9, 527.8) and (403.8, 485.09) to (403.8, 432.4), curve via (403.8, 379.71) and (349.9, 337) to (283.4, 337), curve via (216.9, 337) and (163, 379.71) to (163, 432.4) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (232.29, 460.93) curve via (232.29, 466.42) and (237.9, 470.87) to (244.83, 470.87), curve via (251.75, 470.87) and (257.37, 466.42) to (257.37, 460.93), curve via (257.37, 455.44) and (251.75, 450.99) to (244.83, 450.99), curve via (237.9, 450.99) and (232.29, 455.44) to (232.29, 460.93) (closed) and from (309.43, 460.93) curve via (309.43, 466.42) and (315.05, 470.87) to (321.97, 470.87), curve via (328.9, 470.87) and (334.51, 466.42) to (334.51, 460.93), curve via (334.51, 455.44) and (328.9, 450.99) to (321.97, 450.99), curve via (315.05, 450.99) and (309.43, 455.44) to (309.43, 460.93) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
Path from (232.29, 460.93) curve via (232.29, 466.42) and (237.9, 470.87) to (244.83, 470.87), curve via (251.75, 470.87) and (257.37, 466.42) to (257.37, 460.93), curve via (257.37, 455.44) and (251.75, 450.99) to (244.83, 450.99), curve via (237.9, 450.99) and (232.29, 455.44) to (232.29, 460.93) (closed) and from (309.43, 460.93) curve via (309.43, 466.42) and (315.05, 470.87) to (321.97, 470.87), curve via (328.9, 470.87) and (334.51, 466.42) to (334.51, 460.93), curve via (334.51, 455.44) and (328.9, 450.99) to (321.97, 450.99), curve via (315.05, 450.99) and (309.43, 455.44) to (309.43, 460.93) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (218.14, 390.8) curve via (261.65, 367.12) and (305.1, 367.12) to (348.51, 390.8) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (163, 432.4) curve via (163, 485.09) and (216.9, 527.8) to (283.4, 527.8), curve via (349.9, 527.8) and (403.8, 485.09) to (403.8, 432.4), curve via (403.8, 379.71) and (349.9, 337) to (283.4, 337), curve via (216.9, 337) and (163, 379.71) to (163, 432.4) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (51.2, 60.025) line to (420.15, 60.025), line to (420.15, 736.975), line to (51.2, 736.975) (closed) - Invisible, clipping
Path from (255.48, 564.55) line to (368.53, 564.55), line to (368.53, 579.15), line to (255.48, 579.15) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (255.48, 550.13) line to (368.53, 550.13), line to (368.53, 564.755), line to (255.48, 564.755) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 521.32) line to (368.53, 521.32), line to (368.53, 535.92), line to (255.48, 535.92) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 492.52) line to (368.53, 492.52), line to (368.53, 507.12), line to (255.48, 507.12) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 463.73) line to (368.53, 463.73), line to (368.53, 478.33), line to (255.48, 478.33) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 434.9) line to (368.53, 434.9), line to (368.53, 449.525), line to (255.48, 449.525) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 406.1) line to (368.53, 406.1), line to (368.53, 420.7), line to (255.48, 420.7) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (51.2, 565.15) line to (420.15, 565.15), line to (420.15, 578.55), line to (51.2, 578.55) (closed) - Invisible, clipping
Path from (49.2, 57.825) line to (422.35, 57.825), line to (422.35, 738.975), line to (49.2, 738.975) (closed) - Invisible, clipping
Path from (255.18, 579.45) line to (255.18, 405.8) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (255.08, 405.7) line to (256.08, 405.7), line to (256.08, 579.55), line to (255.08, 579.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (368.02, 578.45) line to (368.02, 405.8) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (367.93, 405.7) line to (368.93, 405.7), line to (368.93, 578.55), line to (367.93, 578.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 579.45) line to (368.83, 579.45) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 578.55) line to (368.93, 578.55), line to (368.93, 579.55), line to (256.08, 579.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 565.05) line to (368.83, 565.05) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 564.15) line to (368.93, 564.15), line to (368.93, 565.15), line to (256.08, 565.15) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 550.63) line to (368.83, 550.63) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 549.72) line to (368.93, 549.72), line to (368.93, 550.72), line to (256.08, 550.72) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 536.22) line to (368.83, 536.22) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 535.33) line to (368.93, 535.33), line to (368.93, 536.33), line to (256.08, 536.33) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 521.82) line to (368.83, 521.82) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 520.92) line to (368.93, 520.92), line to (368.93, 521.92), line to (256.08, 521.92) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 507.42) line to (368.83, 507.42) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 506.52) line to (368.93, 506.52), line to (368.93, 507.52), line to (256.08, 507.52) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 493.02) line to (368.83, 493.02) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 492.13) line to (368.93, 492.13), line to (368.93, 493.13), line to (256.08, 493.13) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 478.63) line to (368.83, 478.63) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 477.73) line to (368.93, 477.73), line to (368.93, 478.73), line to (256.08, 478.73) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 464.23) line to (368.83, 464.23) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 463.32) line to (368.93, 463.32), line to (368.93, 464.32), line to (256.08, 464.32) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 449.82) line to (368.83, 449.82) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 448.9) line to (368.93, 448.9), line to (368.93, 449.925), line to (256.08, 449.925) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 435.4) line to (368.83, 435.4) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 434.5) line to (368.93, 434.5), line to (368.93, 435.5), line to (256.08, 435.5) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 421) line to (368.83, 421) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 420.1) line to (368.93, 420.1), line to (368.93, 421.1), line to (256.08, 421.1) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 406.6) line to (368.83, 406.6) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 405.7) line to (368.93, 405.7), line to (368.93, 406.7), line to (256.08, 406.7) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (280.6, 359.3) line to (287.93, 380.62), line to (307.13, 393.8), line to (330.87, 393.8), line to (350.07, 380.62), line to (357.4, 359.3), line to (350.07, 337.98), line to (330.87, 324.8), line to (307.13, 324.8), line to (287.93, 337.98) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (280.6, 359.3) line to (287.93, 380.62), line to (307.13, 393.8), line to (330.87, 393.8), line to (350.07, 380.62), line to (357.4, 359.3), line to (350.07, 337.98), line to (330.87, 324.8), line to (307.13, 324.8), line to (287.93, 337.98) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
As an aside, the above code uses the following imports:
Imports System.Globalization
Imports System.Text
Imports iText.Kernel.Colors
Imports iText.Kernel.Geom
Imports iText.Kernel.Pdf
Imports iText.Kernel.Pdf.Canvas.Parser
Imports iText.Kernel.Pdf.Canvas.Parser.Data
Imports iText.Kernel.Pdf.Canvas.Parser.Listener