Search code examples
c#leadtools-sdk

how can get text from table in pdf file?


I want to get text from table in PDF file? enter image description here

I cannot get cell in table. I was try to run example of Leadtools but it cannot auto detect cell.

https://www.leadtools.com/help/leadtools/v20/dh/fo/iocrtablezonemanager.html

Can you give me advice? Thanks all


Solution

  • In tables similar to the image you posted, you should be able to find the cells using the IOcrPage.TableZoneManager.AutoDetectCells() method. This method is used in the OcrMultiEngineDemo project that’s shipped with the current version of LEADTOOLS.

    Here’s how you can test it:

    1. Run the OCR Multi-Engine Demo.
    2. Select the OmniPage OCR Engine
    3. Open the image or PDF file that contains the table.
    4. Draw a zone around the table.
    5. Choose “Update Zones…” from the OCR->Zones menu.
    6. In the “Update Zones” dialog, click “Detect Cells” as shown in attached image.

    Table Cells

    If this doesn’t give you the result you’re expecting, send the actual files you’re testing with to [email protected] and explain how you tested exactly.