Search code examples
ocrcapturekofax

How to get cel value in table in PDF scanned by Kofax to excel


I am new to Kofax capture and I am working on retrieving data from a basic scanned invoice copy(PDF) with table that contains list of items to index file. The steps followed are as follows:

  1. Created document class and added index field of type table and table columns such as Date as field. Date column value screenshot of PDF is as follows:

enter image description here

  1. During the validation the date field values are all displayed in one field as follows:

Date: 12/01/2018 12/02/2018 12/03/2018 12/04/2018

  1. Also when the values exported to index file are in the above format.

Is there a way to retrieve values in every cell as separate entries or comma separated using kofax capture?


Solution

  • Plain vanilla Kofax Capture (KC) can't extract data organized in tables. KC can extract static data, i.e. simple key-value pairs (e.g. invoice number, invoice date, total amount).

    Sure, you could try to extract a column like this: enter image description here

    However, this could lead to potential issues down the line. What if the data isn't always in the same place? What if data continues on subsequent pages? What in your zone is smaller than the entire column? What if there are overlapping texts? What if you want another column with additional data, essentially creating rows, but if there are huge gaps in some columns (as in my screenshot)?

    If table extraction is a requirement, you might want to use Kofax Transformation Modules (KTM) which is available as an Add-On to Kofax Capture. KTM has more sophisticated methods of extracting tables that is not limited to individual form layouts.