With libpng, I’m trying to extract text chunks in a 44-megabyte PNG image (and preferably validate that the PNG data is not malformed (e. g. lacking IEND
, etc.)). I could do that with png_read_png
and png_get_text
, but it took way too long for me, 0.47 seconds, which I’m pretty sure is because of the massive amount of the IDAT
chunks the image has. How do I do this in a quicker manner?
I didn’t need the pixels, so I tried to make libpng ignore the IDAT
chunks.
IDAT
chunks, I tried:png_read_info(p_png, p_png_information); png_read_image(p_png, nullptr); png_read_end(p_png, p_png_information);
to skip IDAT
chunks; crashed and failed.png_set_keep_unknown_chunks
to make libpng unknow about IDAT
, and png_set_read_user_chunk_fn(p_png, nullptr, discard_an_unknown_chunk)
(discard_an_unknown_chunk
is a function that does return 1;
) to discard unknown chunks; a weird CRC error occurred on the first IDAT
chunk and failed.And failed to do that.
Running as a Node.js C++ addon, mostly written in C++, on Windows 10, with i9-9900K CPU @ 3.6 GHz and gigabytes of memory.
Read the image file on an SSD with fs.readFileSync
, a Node.js method returning a Buffer
, and tossed it to the libpng to process.
Yes, at first, I blamed libpng for the prolonged computation. Now I see there might be other reasons causing the delay. (If that’s the case, this question would be a bad one with an XY problem.) Thank you for your comments. I’ll check my code out again more thoroughly.
With every step for feeding the PNG data input to the C++ addon kept the same, I ended up manually picking and decoding text chunks only, with my C pointer magic and some C++ magic. And, the performance was impressive (0.0020829 seconds on processing), being almost immediate. Don’t know why and how though.
B:\__A2MSUB\image-processing-utility>npm run test
> image-processing-utility@1.0.0 test B:\__A2MSUB\image-processing-utility
> node tests/test.js
----- “read_png_text_chunks (manual decoding, not using libpng.)” -----
[
{
type: 'tEXt',
keyword: 'date:create',
language_tag: null,
translated_keyword: null,
content: '2020-12-13T22:01:22+09:00',
the_content_is_compressed: false
},
{
type: 'tEXt',
keyword: 'date:modify',
language_tag: null,
translated_keyword: null,
content: '2020-12-13T21:53:58+09:00',
the_content_is_compressed: false
}
]
----- “read_png_text_chunks (manual decoding, not using libpng.)” took 0.013713 seconds.
B:\__A2MSUB\image-processing-utility>
I had to do something similar, but where I wanted libpng to do all of the metadata chunk parsing (e.g. eXIf
, gAMA
, pHYs
, zEXt
, cHRM
, etc. chunks). Some of these chunks can appear after the IDAT
, which means the metadata can't be read with just png_read_info
. (The only way to get to them would be to do a full decode of the image, which is expensive, and then call png_read_end
.)
My solution was to create a synthetic PNG byte stream that is fed to libpng via the read callback set using png_set_read_fn
. In that callback, I skip all IDAT
chunks in the source PNG file, and when I get to an IEND
chunk, I instead emit a zero-length IDAT
chunk.
Now I call png_read_info
: it parses all of the metadata in all of the chunks it sees, stopping at the first IDAT
, which in my synthetic PNG stream is really the end of the source PNG image. Now I have all of the metadata and can query libpng for it via the png_get_xxx
functions.
The read callback that creates the synthetic PNG stream is a little complicated due to it being called by libpng multiple times, each for small sections of the stream. I solved that using a simple state machine that processes the source PNG progressively, producing the synthetic PNG stream on-the-fly. You could avoid those complexities if you produce the synthetic PNG stream up-front in memory before calling png_read_info
: without any real IDAT
s, your full synthetic PNG stream is bound to be small...
While I don't have benchmarks to share here, the final solution is fast because IDAT
s are skipped entirely and not decoded. (I use a file seek to skip each IDAT
in the source PNG after reading the 32-bit chunk length.)