Search code examples
c#image-processingjpegresolutionphotoshop

Reduce & Optimize Scanned Documents File Size


My customer has about 100,000 scanned documents (jpg) which they work with everyday. I want to know how can I reduce the file size of those images for faster file transfer and browsing.

The documents are scanned in black/white, saved in jpg format. They have a resolution of 150dpi and size of 1275x1753 (width x height). The main problem is their size which is between ~150kb and ~500kb which I think is too high for a black/white picture.

Is there a chance that I can reduce their size with changing the resolution, changing some color mode or something? Tried playing around with Photoshop but no luck.

The scanned documents are just for the sole purpose of Reviewing. So I don't think they need much detail or the original pic size.

Gonna write the program in c#, So tell me if there is a good image library for this purpose.


Solution

  • If your images are JPEG-compressed than they are either grayscale (8 bits per pixel) or full color (24 or 32 bits per pixel). I am not aware of any other JPEG types out there.

    Given that, you probably won't get much benefit if you try to convert these images to other formats without changes to their size (number of pixels in both directions) and/or color space.

    There is a possibility that JPEG 2000 might compress your images better than JPEG, but another lossy compression will introduce some more artifacts. You might try for yourself and see if this approach is acceptable for you. I can't recommend you any tools for this approach, though.

    I would recommend you to try and convert your images to bilevel ones (i.e. with only two colors) and compress them with one of the FAX compression schemes (Group 3 or Group 4). You might try to reduce images sizes at the same time, too. This can be easily achieved using Docotic.Pdf library (Disclaimer: I work for the vendor of the library).

    Please take a look at my answer to a question similar to yours. The answer shows how to use RecompressWithGroup4Fax and/or Scale methods to recompress existing images in PDF.

    There is also valuable advice from @plinth about JBIG2 compression and other stuff. Well worth reading.