Search code examples
jpegextractexif

Extract thumbnail from jpeg file


I'd like to extract thumbnail image from jpegs, without any external library. I mean this is not too difficult, because I need to know where the thumbnail starts, and ends in the file, and simply cut it. I study many documentation ( ie.: http://www.media.mit.edu/pia/Research/deepview/exif.html ), and try to analyze jpegs, but not everything clear. I tried to track step by step the bytes, but in the deep I confused. Is there any good documentation, or readable source code to extract the info about thumbnail start and end position within a jpeg file?

Thank you!


Solution

  • For most JPEG images created by phones or digital cameras, the thumbnail image (if present) is stored in the APP1 marker (FFE1). Inside this marker segment is a TIFF file containing the EXIF information for the main image and the optional thumbnail image stored as a JPEG compressed image. The TIFF file usually contains two "pages" where the first page is the EXIF info and the second page is the thumbnail stored in the "old" TIFF type 6 format. Type 6 format is when a JPEG file is just stored as-is inside of a TIFF wrapper. If you want the simplest possible code to extract the thumbnail as a JFIF, you will need to do the following steps:

    1. Familiarize yourself with JFIF and TIFF markers/tags. JFIF markers consist of two bytes: 0xFF followed by the marker type (0xE1 for APP1). These two bytes are followed by the two-byte length stored in big-endian order. For TIFF files, consult the Adobe TIFF 6.0 reference.
    2. Search your JPEG file for the APP1 (FFE1) EXIF marker. There may be multiple APP1 markers and there may be multiple markers before the APP1.
    3. The APP1 marker you're looking for contains the letters "EXIF" immediately after the length field.
    4. Look for "II" or "MM" (6 bytes away from length) to indicate the endianness used in the TIFF file. II = Intel = little endian, MM = Motorola = big endian.
    5. Skip through the first page's tags to find the second IFD where the image is stored. In the second "page", look for the two TIFF tags which point to the JPEG data. Tag 0x201 has the offset of the JPEG data (relative to the II/MM) and tag 0x202 has the length in bytes.