Search code examples
c++java-native-interfacesane

how to append a buffer[] to a vector<> and then copy it to a jbytearray?


This does a sane_read (with SANE) and converts to a java application.

expected_bytes might not be 100% accurate, hence the use of a vector<>.

Something is wrong because the resulting image is wrong. It works if I use env->SetByteArrayRegion directly.

So, the error is on the vector<> context.

What am I doing wrong?

    long expected_bytes = pars.bytes_per_line * pars.lines *
                            ((pars.format == SANE_FRAME_RGB
                              || pars.format == SANE_FRAME_GRAY) ? 1 : 3);

    //fprintf (stderr, "Expected bytes: %ld\n", expected_bytes);

    SANE_Byte buffer[2048];
    SANE_Int bytes_read;
    vector<SANE_Byte *> data;
    data.reserve(expected_bytes);

    do {
        status = sane_read((SANE_Handle *)hnd, buffer, sizeof(buffer), &bytes_read);

        if (status != SANE_STATUS_GOOD) {
            if (status == SANE_STATUS_EOF) {
                break;
            }

            throwEx(env, getError(status));
            break;
        }

        if (bytes_read > 0) {
            data.insert(data.end(), &buffer, (&buffer) + bytes_read);
        }
    }
    while(1);

    jbyteArray bytes = env->NewByteArray(data.size());
    env->SetByteArrayRegion(bytes, 0, data.size(), (const jbyte *) &data[0]);

Solution

  • The first problem is that you've declared data as a vector of SANE_Byte* (i.e. pointers) rather than a vector SANE_Byte (raw characters). When you call SetByteArrayRegion(), you want to be passing in a pointer to character data, but you're actually passing in a pointer to an array of pointers; those will not yield the desired image data.

    The second problem is the way you're inserting data into the vector:

    SANE_Byte buffer[2048];
    ...
    data.insert(data.end(), &buffer, (&buffer) + bytes_read);
    

    Since buffer has type SANE_Byte[2048] (array of 2048 bytes), that means that &buffer has type SANE_Byte(*)[2048] (pointer to array of 2048 bytes). Arrays decay into pointers to their first elements, so in most expression contexts, both buffer and &buffer have equivalent values, but they have very different types and don't behave the same when used with pointer arithmetic.

    Suppose buffer is allocated at address 0x01000000. buffer+1 decays into 0x01000001, and buffer+2047 decays into 0x010007FF, the last element of the array. No surprises there. &buffer is also 0x01000000, but if you consider the expression (&buffer) + 1, something different happens: it points 2048 bytes past the start of the buffer.

    Here's a memory diagram:

    Address        +0       +1     ...    +0x7FF
               +--------+--------+-   -+-----------+
    0x01000000 | buf[0] | buf[1] | ... | buf[2047] |  // This row is &buffer
               +--------+--------+-   -+-----------+
    0x01000800 |   ?    |   ?    | ... |   ?       |  // This row is (&buffer) + 1
               +--------+--------+-   -+-----------+
    0x01001000 |   ?    |   ?    | ... |   ?       |  // This row is (&buffer) + 2
               +--------+--------+-   -+-----------+
    ...            ...
               +--------+--------+-   -+-----------+
    0x013FF800 |   ?    |   ?    | ... |   ?       |  // This row is (&buffer) + 2047
               +--------+--------+-   -+-----------+
    0x01400000 |   ?    |   ?    | ... |   ?       |  // This row is (&buffer) + 2048
               +--------+--------+-   -+-----------+
    

    Thus, if sane_read() returns a full buffer of 2048 bytes, the memory range being inserted to the vector of [&buffer, (&buffer) + bytes_read) is 4 MB of character data being read out of the memory region beginning at buffer, which is an enormous buffer overflow. Then, the call to data.insert() is interpreting those 4 MB as an array of pointers; if you're on a 64-bit system, that would be over 1 million pointers, whose contents are undefined. To be honest, I'm very surprised this didn't crash immediately.

    The proper way to construct the vector would be to declare it of type vector<SANE_Byte> (characters, not pointers), and copy each buffer into it on each loop iteration. The way to reference the right byte ranges of buffer is not to use &buffer to (&buffer) + bytes_read, but rather buffer to buffer + bytes_read, or equivalently &buffer[0] to &buffer[bytes_read]:

    vector<SANE_Byte> data;
    ...
    data.insert(data.end(), buffer, buffer + bytes_read);