This does a sane_read (with SANE) and converts to a java application.
expected_bytes
might not be 100% accurate, hence the use of a vector<>.
Something is wrong because the resulting image is wrong. It works if I use env->SetByteArrayRegion
directly.
So, the error is on the vector<> context.
What am I doing wrong?
long expected_bytes = pars.bytes_per_line * pars.lines *
((pars.format == SANE_FRAME_RGB
|| pars.format == SANE_FRAME_GRAY) ? 1 : 3);
//fprintf (stderr, "Expected bytes: %ld\n", expected_bytes);
SANE_Byte buffer[2048];
SANE_Int bytes_read;
vector<SANE_Byte *> data;
data.reserve(expected_bytes);
do {
status = sane_read((SANE_Handle *)hnd, buffer, sizeof(buffer), &bytes_read);
if (status != SANE_STATUS_GOOD) {
if (status == SANE_STATUS_EOF) {
break;
}
throwEx(env, getError(status));
break;
}
if (bytes_read > 0) {
data.insert(data.end(), &buffer, (&buffer) + bytes_read);
}
}
while(1);
jbyteArray bytes = env->NewByteArray(data.size());
env->SetByteArrayRegion(bytes, 0, data.size(), (const jbyte *) &data[0]);
The first problem is that you've declared data
as a vector of SANE_Byte*
(i.e. pointers) rather than a vector SANE_Byte
(raw characters). When you call SetByteArrayRegion()
, you want to be passing in a pointer to character data, but you're actually passing in a pointer to an array of pointers; those will not yield the desired image data.
The second problem is the way you're inserting data into the vector:
SANE_Byte buffer[2048];
...
data.insert(data.end(), &buffer, (&buffer) + bytes_read);
Since buffer
has type SANE_Byte[2048]
(array of 2048 bytes), that means that &buffer
has type SANE_Byte(*)[2048]
(pointer to array of 2048 bytes). Arrays decay into pointers to their first elements, so in most expression contexts, both buffer
and &buffer
have equivalent values, but they have very different types and don't behave the same when used with pointer arithmetic.
Suppose buffer
is allocated at address 0x01000000. buffer+1
decays into 0x01000001, and buffer+2047
decays into 0x010007FF, the last element of the array. No surprises there. &buffer
is also 0x01000000, but if you consider the expression (&buffer) + 1
, something different happens: it points 2048 bytes past the start of the buffer.
Here's a memory diagram:
Address +0 +1 ... +0x7FF +--------+--------+- -+-----------+ 0x01000000 | buf[0] | buf[1] | ... | buf[2047] | // This row is &buffer +--------+--------+- -+-----------+ 0x01000800 | ? | ? | ... | ? | // This row is (&buffer) + 1 +--------+--------+- -+-----------+ 0x01001000 | ? | ? | ... | ? | // This row is (&buffer) + 2 +--------+--------+- -+-----------+ ... ... +--------+--------+- -+-----------+ 0x013FF800 | ? | ? | ... | ? | // This row is (&buffer) + 2047 +--------+--------+- -+-----------+ 0x01400000 | ? | ? | ... | ? | // This row is (&buffer) + 2048 +--------+--------+- -+-----------+
Thus, if sane_read()
returns a full buffer of 2048 bytes, the memory range being inserted to the vector of [&buffer, (&buffer) + bytes_read)
is 4 MB of character data being read out of the memory region beginning at buffer
, which is an enormous buffer overflow. Then, the call to data.insert()
is interpreting those 4 MB as an array of pointers; if you're on a 64-bit system, that would be over 1 million pointers, whose contents are undefined. To be honest, I'm very surprised this didn't crash immediately.
The proper way to construct the vector would be to declare it of type vector<SANE_Byte>
(characters, not pointers), and copy each buffer into it on each loop iteration. The way to reference the right byte ranges of buffer
is not to use &buffer
to (&buffer) + bytes_read
, but rather buffer
to buffer + bytes_read
, or equivalently &buffer[0]
to &buffer[bytes_read]
:
vector<SANE_Byte> data;
...
data.insert(data.end(), buffer, buffer + bytes_read);