Search code examples
amazon-web-servicestext-extractionamazon-textract

How does AWS Textract measure the number of pages?


On Amazon's pricing page, it states that for US-east-1 in Ohio that the pricing is 5 cents per page of document analysis with forms. (https://aws.amazon.com/textract/pricing/)

I am wondering how a page is measured - for example, if images of the form are cropped and placed into one PDF together, would this still constitute one page?

Also, I am aware Textract does processing on images as well. How are pages of images constituted in Textract?

Would a cost-saving mechanism be to embed as much text needed for analysis into one PDF page, even though this may slightly decrease accuracy?

Our company requires the processing of over millions of paper forms, this is literally the difference between a 5,000$ bill a month versus a 200,000$ thousand dollar bill a month from Amazon. We are thus forced right now to use DocumentDetection at 0.1 cent a page, but we would like to use form/table data analysis which is currently at 6 cents a page.


Solution

  • On their pricing page that you linked, they say

    A single page may contain between 0 and 3,000 words.

    So I guess as long as you pack all necessary <3,000 word items into one page, you will be billed for one page.