Search code examples
iosswiftswift3tesseract

Swift 3 - Which pixel format type do I use for best Tessecract text recognition?


I am using Swift 3 to build a mobile app that allows the user to take a picture and run Tesseract OCR over the resulting image.

According to this: https://developer.apple.com/reference/corevideo/cvpixelformatdescription/1563591-pixel_format_types

I have a lot of possible pixel format types with which to format the pixel that is taken on my iPhone 7. I'm a little lost as to what all these terms even mean to begin with but does anyone have advice as to what format would give me the best chance of improving Tesseract text recognition?


Solution

  • kCVPixelFormatType_24RGB, kCVPixelFormatType_24BGR, kCVPixelFormatType_32ARGB, kCVPixelFormatType_32BGRA, kCVPixelFormatType_32ABGR, kCVPixelFormatType_32RGBA all of these would be the best options and is usually the most COMMON options (IE: 24-bit bitmap, 24-bit PNG, 32-bit bitmap, 32-bit PNG, etc).

    Basically, 24-bit only contains R, G, B, pixel components and the alpha channel is completely missing. 32-bit contains an alpha channel so: R, G, B, A, components would be used. Usually 24-bit works really well on Tesseract and 32-bit works really well when the alpha channel is transparent (0x0 or 0xFF for all bytes). This is equivalent to using BMP or PNG format.

    Note: The above is just formats. Ideally, your image needs to be pretty decent quality as well (the best is usually white text, black background or black text, white background or some great contrast between the text and the background). It will depend on the image as well (not just the format).

    As for capture settings: AVCapturePhotoSettings, allocating one will give you default settings. You can create your own using:

    https://developer.apple.com/reference/avfoundation/avcapturephotosettings/1648673-photosettingswithformat?changes=latest_minor&language=objc

    It tells you what parameters to pass. It also lets you also determine whether or not it should be high res, live photo, etc.. You can see here for more: https://developer.apple.com/reference/avfoundation/avcapturephotosettings?changes=latest_minor&language=objc

    availablePhotoCodecTypes returns JPEG, PNG, BMP, etc. Just different formats that support compression for capturing. When you capture RAW or BMP, it is uncompressed. BMP compression for example uses RLE (Run Length Encoding). PNG uses zlib to compress and so does JPEG.

    For videos, it would return maybe MP4, MPEG-4, etc. See: https://www.thedroidsonroids.com/blog/ios/whats-new-avfoundation-ios-10/ for examples.