Search code examples
.netjpegtiff

Extract JPEG from TIFF file


Background I have a large TIFF file that is compressed with JPEG (new, compression 7 in TIFF standard) and is tiled. What I need to do is extract these tiles to individual .jpg files. I need to be able to do this with out decompressing/recompressing the image data because that will require too much compute resources, so all libraries that I know of are out of the question.

I know a lot about TIFF file structure, but almost nothing about JPEG file structure. I have code written right now that reads the JPEGTable tag data from the tiff header into a byte array (meaning it goes to the offset pointed at by the tag and reads it there) And another blurb of code that reads the target Tile into a byte array. Then I am writing the Table byte array to a new file, then writing the Tile byte array to that file after that. The last 2 bytes of the Table array I write over with 0xFF, 0xFF and the same for the first 2 bytes of the Tile array, because I found that both arrays start and end with the jpeg SOI and EOI sequences respectively and if I had more than 1 of each, the files wouldn't be openable by any image programs.

For i as Integer = 0 to TableArray.Count-3
    stream.WriteByte(TableArray(i))
Next
stream.WriteByte(255)
stream.WriteByte(255)
stream.WriteByte(255)
stream.WriteByte(255)
For i as Integer = 2 to TileArray.Count-1
    stream.WriteByte(TileArray(i))
Next
stream.Close()

Problem So that is where I am right now, the problem is that my extracted tiles are all shaded pink where it should be white, almost like a color negative. It isn't solid Pink, I can see outlines of objects I know are in the original image. Does any one have any ideas how I might be able to solve this? Also, I am doing this in VB.NET, but I don't think the language really matters in this case as it seems to be more of a concept/algorithm/file structure issue I am doing wrong.

If some one would like me to post some of the code I am using, I can, just need to know which part. Extracted Original

EDIT: I found in the Adobe Photoshop TIFF Technical Notes from March 22, 2002 a section that says:

Conversion from TIFF to interchange JPEG is more complex. A strip-based TIFF/JPEG file can be converted fairly easily if all strips use identical JPEG tables and no RSTn markers: just delete the overhead markers and insert RSTn markers between strips. Converting tiled images is harder, since the data will usually not be in the right order (unless the tiles are only one MCU high). This can still be done losslessly, but it will require undoing and redoing the entropy coding so that the DC coefficient differences can be updated.

Not sure if that is relevant to my problem or not.


Solution

  • The difficulty with TIFF files produced by Photoshop is that they support writing the RGB colorspace into JPEG compressed data. If you extract a single tile from your TIFF file and write it as an independent JPEG image, it will not display correctly because decoders assume that the colorspace is YCbCr. There is a solution as long as the viewing application respects the Adobe APP14 marker. Included in this marker is a byte which defines the transform (colorspace). If you insert this sequence of bytes before the SOI, your image will display correctly on many viewers.

    FF EE 00 0E 41 64 6F 62 65 00 64 80 00 00 00 00

    The last byte defines the transform; in this case 0 indicates the RGB colorspace. You can read more about it here:

    Oracle JPEG metadata doc