Search code examples
multithreadingqtqimage

QT multithreaded QImage change


I am writing a software which does a lot of image operations / compositing on a lot of (potential big) images.
Multi threading helps a lot speed wise, but QT does not allow using multiple QPainter on the same image at the same time.
So i would have to do my image operation / compositing in each thread on a copy and then blit it back, which reduces the performance by a lot (depending on the use case obviously).

So i came up with an idea which seem to work, but feels extremely hacky.
I get the target image data (QImage::bits) pointer, and provide this to the worker thread.
In the worker thread i recreate a new QImage from the provided pointer. This means, no copy, no blitting. It seem to work fine, as long as i make sure each pixel/tile is only worked on in one thread and i don't detach the target image.

My question is: Is this safe and are there any other issues that could arise from this approach ?

Example code

QImage source = ...;
QImage target = ...;
QPainter::CompositionMode compositionMode = QPainter::CompositionMode_SourceOver;

// calculate tiles
QList<QRect> tiles;
for(int y = rect.top(); y < rect.top() + rect.height(); y += tileSize){
    for(int x = rect.left(); x < rect.left() + rect.width(); x += tileSize){
        QRect tile(
                    x, y,
                    x + tileSize > rect.left() + rect.width() ? rect.left() + rect.width() - x : tileSize,
                    y + tileSize > rect.top() + rect.height() ? rect.top() + rect.height() - y : tileSize
        );
        tiles.append(tile);
    }
}

// Get target pixel pointer and do threaded operation on each tile
uchar *targetPix = target.bits();
auto target_size = target.size();
auto targetFormat = target.format();
QList<int> lol = QtConcurrent::blockingMapped(tiles, [&target_size, &targetFormat, &source, targetPix, &compositionMode](const QRect &r){
    QImage tile_target(targetPix, target_size.width(), target_size.height(), targetFormat);
    QPainter p(&tile_target);
    p.setCompositionMode(compositionMode);
    // do you image operations here. For now we just do a simple draw
    p.drawImage(r.topLeft(), source, r);
    return 1; // In reallity this would return sensible data ;)
});

(This example increased the speed in my test by about 4.6 times btw. Depends on the operation and system of course.)


Solution

  • Short answer

    This is indeed tricky (but that's often needed when you want bleeding edge performance), but it should be (possible to make it) thread safe (for certain operations). Off course this depends on the operations you perform on the tile_target.

    It's up to you that you don't even access the bits outside the assigned tile (i.e. the portion of tile_target outside the rect r).

    Some considerations

    Ensure you only access the bits of the assigned tile

    As tile_target refers the whole image, it's up to you to ensure you don't access bits outside this target tile. Some problematic cases:

    • anti-aliasing: you will probably alter bits of neighboring while performing such an operation
    • Filtering: operations should as blurring often read the neighboring bits to calculate an 'averaged' value. Reading is not safe either if another thread may be writing the same bit simultaneously.

    Possible solution?: One option to allow accessing and/or writing to neighboring bits of a tile, is to divide your image in stripes and process your image in two steps:

    • first process the even stripes simultaneously
    • then process the uneven stripes simultaneously

    This procedures allows you to modify half of the neighboring stripes (useful for anti-aliasing) or (if nobody writes) to access all the bits of the next and/or previous stripe (useful for filtering purposes).

    This shouldn't decrease the efficiency significantly, if you create enough stripes to keep all your cpu(s) busy (i.e. typically twice the number of threads your cpu(s) support).

    Should I worry about detaching?

    That shouldn't be an issue with your current implementation. QImage::bits already detaches (if needed) the image (target) from any other possibly existing copy. As you perform the concurrent operation by blocking the calling thread. The original image (target) will exist at least as long as the tile_target images exist.

    Safer approach

    • Use a library that is dedicated to multi-threaded image processing, or at least allow referencing a subimage.

    • Pass a copy of the tile (see QImage::copy) to each thread and write the result back in the original image (either using mutexes or by performing this operation in the calling thread). Depending on the calculation insensitivity of the image operation, this extra copy may or may not be negligible. For the OP, this doesn't seem a viable option.

    Note that those safer approaches may generate (slightly) different results from the single threaded result in case of anti-aliasing or filtering. Those artifacts may be minimized by taking the tiles as large as possible (i.e. create not more tiles than the number of threads your cpu supports)

    Use GPU

    Image processing is often much faster when using your GPU, especially for things like filtering. But that's not something QImage supports out of the box.