Search code examples
vb.netpdfshapesitext7data-extraction

How can I extract rectangles using itext7 from pdf in vb.net?


I want to extract the rectangles from pdf with their location and fill color. Please help me if anyone here has some idea about extraction of rectangle shapes from pdf using itext7 in vb.net


Solution

  • As it turned out in comments, the forms the OP is interested in are vector graphics but not only rectangles but essentially arbitrary shapes. Thus, this answer demonstrates how to extract vector graphics paths and their use (stroke/fill/...) using vb.net.

    For data extraction from PDFs iText 7 provides a framework that follows the instructions in a PDF content stream and triggers events accordingly. To extract the paths, therefore, you have to first of all implement an event listener (implementing the iText IEventListener interface). This implementation then needs to select only the required events (with EventType.RENDER_PATH) and extract the desired information from the given PathRenderInfo event data object.

    The following example event listener class simply prints the path information to the console:

    Public Class PathListener
        Implements IEventListener
    
        Public Sub EventOccurred(data As IEventData, type As EventType) Implements IEventListener.EventOccurred
            If type = EventType.RENDER_PATH Then
                Dim PathRenderInfo As PathRenderInfo = CType(data, PathRenderInfo)
                Dim OperationData = GetOperationData(PathRenderInfo)
                Dim PathData = GetPathData(PathRenderInfo)
                Console.WriteLine("{1} - {0}", OperationData, PathData)
            End If
        End Sub
    
        Public Function GetSupportedEvents() As ICollection(Of EventType) Implements IEventListener.GetSupportedEvents
            Return Nothing
        End Function
    
        Function GetOperationData(PathRenderInfo As PathRenderInfo) As String
            Dim OperationBuilder As New StringBuilder
            If PathRenderInfo.GetOperation = PathRenderInfo.NO_OP Then
                OperationBuilder.Append("Invisible")
            End If
            If (PathRenderInfo.GetOperation And PathRenderInfo.STROKE) = PathRenderInfo.STROKE Then
                OperationBuilder.Append("Stroked with ").Append(GetColorData(PathRenderInfo.GetStrokeColor))
                If (PathRenderInfo.GetOperation And PathRenderInfo.FILL) = PathRenderInfo.FILL Then
                    OperationBuilder.Append(" and ")
                End If
            End If
            If (PathRenderInfo.GetOperation And PathRenderInfo.FILL) = PathRenderInfo.FILL Then
                OperationBuilder.Append("Filled with ").Append(GetColorData(PathRenderInfo.GetFillColor))
            End If
            If (PathRenderInfo.IsPathModifiesClippingPath) Then
                OperationBuilder.Append(", clipping")
            End If
            Return OperationBuilder.ToString
        End Function
    
        Function GetColorData(Color As Color) As String
            Dim ColorBuilder As New StringBuilder
            If TypeOf Color Is CalGray Then
                ColorBuilder.Append("CalGray")
            ElseIf TypeOf Color Is CalRgb Then
                ColorBuilder.Append("CalRGB")
            ElseIf TypeOf Color Is DeviceCmyk Then
                ColorBuilder.Append("DeviceCmyk")
            ElseIf TypeOf Color Is DeviceGray Then
                ColorBuilder.Append("DeviceGray")
            ElseIf TypeOf Color Is DeviceN Then
                ColorBuilder.Append("DeviceN")
            ElseIf TypeOf Color Is DeviceRgb Then
                ColorBuilder.Append("DeviceRgb")
            ElseIf TypeOf Color Is IccBased Then
                ColorBuilder.Append("IccBased")
            ElseIf TypeOf Color Is Indexed Then
                ColorBuilder.Append("Indexed")
            ElseIf TypeOf Color Is Lab Then
                ColorBuilder.Append("Lab")
            ElseIf TypeOf Color Is PatternColor Then
                Return "PatternColor(special)"
            ElseIf TypeOf Color Is Separation Then
                ColorBuilder.Append("Separation")
            End If
            ColorBuilder.Append("(").Append(String.Join(", ", Color.GetColorValue)).Append(")")
            Return ColorBuilder.ToString
        End Function
    
        Function GetPathData(PathRenderInfo As PathRenderInfo) As String
            Dim CurrentTransformation = PathRenderInfo.GetCtm
            Dim PathBuilder As New StringBuilder
            Dim FirstSubPath = True
            For Each SubPath In PathRenderInfo.GetPath.GetSubpaths
                If FirstSubPath Then
                    FirstSubPath = False
                    PathBuilder.Append("Path ")
                ElseIf Not (SubPath.IsEmpty Or SubPath.GetSegments.Count = 0) Then
                    PathBuilder.Append(" and ")
                End If
                Dim FirstShape = True
                For Each Shape In SubPath.GetSegments
                    If FirstShape Then
                        FirstShape = False
                        PathBuilder.Append("from ").Append(GetPointData(Shape.GetBasePoints.First, CurrentTransformation))
                    Else
                        PathBuilder.Append(",")
                    End If
                    If TypeOf Shape Is Line Then
                        PathBuilder.Append(" line to ").Append(GetPointData(Shape.GetBasePoints.Last, CurrentTransformation))
                    ElseIf TypeOf Shape Is BezierCurve Then
                        PathBuilder.Append(" curve via ").Append(GetPointData(Shape.GetBasePoints(1), CurrentTransformation))
                        PathBuilder.Append(" and ").Append(GetPointData(Shape.GetBasePoints(2), CurrentTransformation))
                        PathBuilder.Append(" to ").Append(GetPointData(Shape.GetBasePoints(3), CurrentTransformation))
                    End If
                Next
                If SubPath.IsClosed Then
                    PathBuilder.Append(" (closed)")
                End If
            Next
            Return PathBuilder.ToString
        End Function
    
        Function GetPointData(Point As Point, CurrentTransformation As Matrix) As String
            Dim Transformed = CurrentTransformation.Multiply(New Matrix(Point.GetX, Point.GetY))
            Return String.Format(CultureInfo.InvariantCulture, "({0}, {1})", Transformed.Get(Matrix.I31), Transformed.Get(Matrix.I32))
        End Function
    End Class
    

    Using this event listener you can inspect the pages of your document:

    Using PdfDocument As New PdfDocument(New PdfReader(...))
        Dim PathListener As New PathListener
        Dim PdfCanvasProcessor As New PdfCanvasProcessor(PathListener)
        For page As Integer = 1 To PdfDocument.GetNumberOfPages
            PdfCanvasProcessor.ProcessPageContent(PdfDocument.GetPage(page))
        Next
    End Using
    

    The output then may look like this:

    Path from (51.2, 723.57) line to (512.17, 723.57), line to (512.17, 736.97), line to (51.2, 736.97) (closed) - Invisible, clipping
    Path from (108.6, 516.6) curve via (108.6, 569.29) and (160.18, 612) to (223.8, 612), curve via (287.42, 612) and (339, 569.29) to (339, 516.6), curve via (339, 463.91) and (287.42, 421.2) to (223.8, 421.2), curve via (160.18, 421.2) and (108.6, 463.91) to (108.6, 516.6) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (174.89, 545.13) curve via (174.89, 550.62) and (180.27, 555.07) to (186.89, 555.07), curve via (193.52, 555.07) and (198.89, 550.62) to (198.89, 545.13), curve via (198.89, 539.64) and (193.52, 535.19) to (186.89, 535.19), curve via (180.27, 535.19) and (174.89, 539.64) to (174.89, 545.13) (closed) and from (248.71, 545.13) curve via (248.71, 550.62) and (254.08, 555.07) to (260.71, 555.07), curve via (267.33, 555.07) and (272.71, 550.62) to (272.71, 545.13), curve via (272.71, 539.64) and (267.33, 535.19) to (260.71, 535.19), curve via (254.08, 535.19) and (248.71, 539.64) to (248.71, 545.13) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
    Path from (174.89, 545.13) curve via (174.89, 550.62) and (180.27, 555.07) to (186.89, 555.07), curve via (193.52, 555.07) and (198.89, 550.62) to (198.89, 545.13), curve via (198.89, 539.64) and (193.52, 535.19) to (186.89, 535.19), curve via (180.27, 535.19) and (174.89, 539.64) to (174.89, 545.13) (closed) and from (248.71, 545.13) curve via (248.71, 550.62) and (254.08, 555.07) to (260.71, 555.07), curve via (267.33, 555.07) and (272.71, 550.62) to (272.71, 545.13), curve via (272.71, 539.64) and (267.33, 535.19) to (260.71, 535.19), curve via (254.08, 535.19) and (248.71, 539.64) to (248.71, 545.13) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (161.36, 475) curve via (202.99, 451.32) and (244.56, 451.32) to (286.09, 475) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (108.6, 516.6) curve via (108.6, 569.29) and (160.18, 612) to (223.8, 612), curve via (287.42, 612) and (339, 569.29) to (339, 516.6), curve via (339, 463.91) and (287.42, 421.2) to (223.8, 421.2), curve via (160.18, 421.2) and (108.6, 463.91) to (108.6, 516.6) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (51.2, 565.15) line to (512.17, 565.15), line to (512.17, 578.55), line to (51.2, 578.55) (closed) - Invisible, clipping
    Path from (147.8, 556.2) curve via (147.8, 608.89) and (199.38, 651.6) to (263, 651.6), curve via (326.62, 651.6) and (378.2, 608.89) to (378.2, 556.2), curve via (378.2, 503.51) and (326.62, 460.8) to (263, 460.8), curve via (199.38, 460.8) and (147.8, 503.51) to (147.8, 556.2) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (214.09, 584.73) curve via (214.09, 590.22) and (219.47, 594.67) to (226.09, 594.67), curve via (232.72, 594.67) and (238.09, 590.22) to (238.09, 584.73), curve via (238.09, 579.24) and (232.72, 574.79) to (226.09, 574.79), curve via (219.47, 574.79) and (214.09, 579.24) to (214.09, 584.73) (closed) and from (287.91, 584.73) curve via (287.91, 590.22) and (293.28, 594.67) to (299.91, 594.67), curve via (306.53, 594.67) and (311.91, 590.22) to (311.91, 584.73), curve via (311.91, 579.24) and (306.53, 574.79) to (299.91, 574.79), curve via (293.28, 574.79) and (287.91, 579.24) to (287.91, 584.73) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
    Path from (214.09, 584.73) curve via (214.09, 590.22) and (219.47, 594.67) to (226.09, 594.67), curve via (232.72, 594.67) and (238.09, 590.22) to (238.09, 584.73), curve via (238.09, 579.24) and (232.72, 574.79) to (226.09, 574.79), curve via (219.47, 574.79) and (214.09, 579.24) to (214.09, 584.73) (closed) and from (287.91, 584.73) curve via (287.91, 590.22) and (293.28, 594.67) to (299.91, 594.67), curve via (306.53, 594.67) and (311.91, 590.22) to (311.91, 584.73), curve via (311.91, 579.24) and (306.53, 574.79) to (299.91, 574.79), curve via (293.28, 574.79) and (287.91, 579.24) to (287.91, 584.73) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (200.56, 514.6) curve via (242.19, 490.92) and (283.76, 490.92) to (325.29, 514.6) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (147.8, 556.2) curve via (147.8, 608.89) and (199.38, 651.6) to (263, 651.6), curve via (326.62, 651.6) and (378.2, 608.89) to (378.2, 556.2), curve via (378.2, 503.51) and (326.62, 460.8) to (263, 460.8), curve via (199.38, 460.8) and (147.8, 503.51) to (147.8, 556.2) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (103, 398.43) line to (487, 398.43), line to (487, 605.83), line to (103, 605.83) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (103, 398.43) line to (487, 398.43), line to (487, 605.83), line to (103, 605.83) (closed) - Stroked with DeviceGray(0.525)
    Path from (229.4, 344.2) curve via (229.4, 363.31) and (246.59, 378.8) to (267.8, 378.8), curve via (289.01, 378.8) and (306.2, 363.31) to (306.2, 344.2), curve via (306.2, 325.09) and (289.01, 309.6) to (267.8, 309.6), curve via (246.59, 309.6) and (229.4, 325.09) to (229.4, 344.2) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (229.4, 344.2) curve via (229.4, 363.31) and (246.59, 378.8) to (267.8, 378.8), curve via (289.01, 378.8) and (306.2, 363.31) to (306.2, 344.2), curve via (306.2, 325.09) and (289.01, 309.6) to (267.8, 309.6), curve via (246.59, 309.6) and (229.4, 325.09) to (229.4, 344.2) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (237.4, 256.57) line to (266.74, 256.57), line to (275.8, 283), line to (284.86, 256.57), line to (314.2, 256.57), line to (290.47, 240.23), line to (299.53, 213.8), line to (275.8, 230.14), line to (252.07, 213.8), line to (261.13, 240.23) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (237.4, 256.57) line to (266.74, 256.57), line to (275.8, 283), line to (284.86, 256.57), line to (314.2, 256.57), line to (290.47, 240.23), line to (299.53, 213.8), line to (275.8, 230.14), line to (252.07, 213.8), line to (261.13, 240.23) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (163, 432.4) curve via (163, 485.09) and (216.9, 527.8) to (283.4, 527.8), curve via (349.9, 527.8) and (403.8, 485.09) to (403.8, 432.4), curve via (403.8, 379.71) and (349.9, 337) to (283.4, 337), curve via (216.9, 337) and (163, 379.71) to (163, 432.4) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (232.29, 460.93) curve via (232.29, 466.42) and (237.9, 470.87) to (244.83, 470.87), curve via (251.75, 470.87) and (257.37, 466.42) to (257.37, 460.93), curve via (257.37, 455.44) and (251.75, 450.99) to (244.83, 450.99), curve via (237.9, 450.99) and (232.29, 455.44) to (232.29, 460.93) (closed) and from (309.43, 460.93) curve via (309.43, 466.42) and (315.05, 470.87) to (321.97, 470.87), curve via (328.9, 470.87) and (334.51, 466.42) to (334.51, 460.93), curve via (334.51, 455.44) and (328.9, 450.99) to (321.97, 450.99), curve via (315.05, 450.99) and (309.43, 455.44) to (309.43, 460.93) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
    Path from (232.29, 460.93) curve via (232.29, 466.42) and (237.9, 470.87) to (244.83, 470.87), curve via (251.75, 470.87) and (257.37, 466.42) to (257.37, 460.93), curve via (257.37, 455.44) and (251.75, 450.99) to (244.83, 450.99), curve via (237.9, 450.99) and (232.29, 455.44) to (232.29, 460.93) (closed) and from (309.43, 460.93) curve via (309.43, 466.42) and (315.05, 470.87) to (321.97, 470.87), curve via (328.9, 470.87) and (334.51, 466.42) to (334.51, 460.93), curve via (334.51, 455.44) and (328.9, 450.99) to (321.97, 450.99), curve via (315.05, 450.99) and (309.43, 455.44) to (309.43, 460.93) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (218.14, 390.8) curve via (261.65, 367.12) and (305.1, 367.12) to (348.51, 390.8) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (163, 432.4) curve via (163, 485.09) and (216.9, 527.8) to (283.4, 527.8), curve via (349.9, 527.8) and (403.8, 485.09) to (403.8, 432.4), curve via (403.8, 379.71) and (349.9, 337) to (283.4, 337), curve via (216.9, 337) and (163, 379.71) to (163, 432.4) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    Path from (51.2, 60.025) line to (420.15, 60.025), line to (420.15, 736.975), line to (51.2, 736.975) (closed) - Invisible, clipping
    Path from (255.48, 564.55) line to (368.53, 564.55), line to (368.53, 579.15), line to (255.48, 579.15) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (255.48, 550.13) line to (368.53, 550.13), line to (368.53, 564.755), line to (255.48, 564.755) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
    Path from (255.48, 521.32) line to (368.53, 521.32), line to (368.53, 535.92), line to (255.48, 535.92) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
    Path from (255.48, 492.52) line to (368.53, 492.52), line to (368.53, 507.12), line to (255.48, 507.12) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
    Path from (255.48, 463.73) line to (368.53, 463.73), line to (368.53, 478.33), line to (255.48, 478.33) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
    Path from (255.48, 434.9) line to (368.53, 434.9), line to (368.53, 449.525), line to (255.48, 449.525) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
    Path from (255.48, 406.1) line to (368.53, 406.1), line to (368.53, 420.7), line to (255.48, 420.7) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
    Path from (51.2, 565.15) line to (420.15, 565.15), line to (420.15, 578.55), line to (51.2, 578.55) (closed) - Invisible, clipping
    Path from (49.2, 57.825) line to (422.35, 57.825), line to (422.35, 738.975), line to (49.2, 738.975) (closed) - Invisible, clipping
    Path from (255.18, 579.45) line to (255.18, 405.8) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (255.08, 405.7) line to (256.08, 405.7), line to (256.08, 579.55), line to (255.08, 579.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (368.02, 578.45) line to (368.02, 405.8) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (367.93, 405.7) line to (368.93, 405.7), line to (368.93, 578.55), line to (367.93, 578.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 579.45) line to (368.83, 579.45) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 578.55) line to (368.93, 578.55), line to (368.93, 579.55), line to (256.08, 579.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 565.05) line to (368.83, 565.05) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 564.15) line to (368.93, 564.15), line to (368.93, 565.15), line to (256.08, 565.15) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 550.63) line to (368.83, 550.63) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 549.72) line to (368.93, 549.72), line to (368.93, 550.72), line to (256.08, 550.72) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 536.22) line to (368.83, 536.22) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 535.33) line to (368.93, 535.33), line to (368.93, 536.33), line to (256.08, 536.33) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 521.82) line to (368.83, 521.82) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 520.92) line to (368.93, 520.92), line to (368.93, 521.92), line to (256.08, 521.92) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 507.42) line to (368.83, 507.42) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 506.52) line to (368.93, 506.52), line to (368.93, 507.52), line to (256.08, 507.52) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 493.02) line to (368.83, 493.02) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 492.13) line to (368.93, 492.13), line to (368.93, 493.13), line to (256.08, 493.13) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 478.63) line to (368.83, 478.63) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 477.73) line to (368.93, 477.73), line to (368.93, 478.73), line to (256.08, 478.73) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 464.23) line to (368.83, 464.23) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 463.32) line to (368.93, 463.32), line to (368.93, 464.32), line to (256.08, 464.32) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 449.82) line to (368.83, 449.82) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 448.9) line to (368.93, 448.9), line to (368.93, 449.925), line to (256.08, 449.925) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 435.4) line to (368.83, 435.4) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 434.5) line to (368.93, 434.5), line to (368.93, 435.5), line to (256.08, 435.5) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 421) line to (368.83, 421) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 420.1) line to (368.93, 420.1), line to (368.93, 421.1), line to (256.08, 421.1) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.18, 406.6) line to (368.83, 406.6) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
    Path from (256.08, 405.7) line to (368.93, 405.7), line to (368.93, 406.7), line to (256.08, 406.7) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
    Path from (280.6, 359.3) line to (287.93, 380.62), line to (307.13, 393.8), line to (330.87, 393.8), line to (350.07, 380.62), line to (357.4, 359.3), line to (350.07, 337.98), line to (330.87, 324.8), line to (307.13, 324.8), line to (287.93, 337.98) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
    Path from (280.6, 359.3) line to (287.93, 380.62), line to (307.13, 393.8), line to (330.87, 393.8), line to (350.07, 380.62), line to (357.4, 359.3), line to (350.07, 337.98), line to (330.87, 324.8), line to (307.13, 324.8), line to (287.93, 337.98) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
    

    As an aside, the above code uses the following imports:

    Imports System.Globalization
    Imports System.Text
    Imports iText.Kernel.Colors
    Imports iText.Kernel.Geom
    Imports iText.Kernel.Pdf
    Imports iText.Kernel.Pdf.Canvas.Parser
    Imports iText.Kernel.Pdf.Canvas.Parser.Data
    Imports iText.Kernel.Pdf.Canvas.Parser.Listener