Search code examples
iosobjective-cgpugpuimagehough-transform

Using GPUImage and GPUImageHoughTransformLineDetector to detect highlighted text bounding box


I am using GPUImageHoughTransformLineDetector to try to detect the highlighted text in the image:

enter image description here

I am using the following code to try and detect the bounding blue box lines:

GPUImagePicture *stillImageSource = [[GPUImagePicture alloc] initWithImage:rawImage];
GPUImageHoughTransformLineDetector *lineFilter = [[GPUImageHoughTransformLineDetector alloc] init];
[stillImageSource addTarget:lineFilter];
GPUImageLineGenerator *lineDrawFilter = [[GPUImageLineGenerator alloc] init];
[lineDrawFilter forceProcessingAtSize:rawImage.size];

__weak typeof(self) weakSelf = self;
[lineFilter setLinesDetectedBlock:^(GLfloat *flt, NSUInteger count, CMTime time) {
    NSLog(@"Number of lines: %ld", (unsigned long)count);
    GPUImageAlphaBlendFilter *blendFilter = [[GPUImageAlphaBlendFilter alloc] init];
    [blendFilter forceProcessingAtSize:rawImage.size];
    [stillImageSource addTarget:blendFilter];
    [lineDrawFilter addTarget:blendFilter];

    [blendFilter useNextFrameForImageCapture];
    [lineDrawFilter renderLinesFromArray:flt count:count frameTime:time];
    weakSelf.doneProcessingImage([blendFilter imageFromCurrentFramebuffer]);
}];
[stillImageSource processImage];

Every time I run this regardless of the edgeThreshold or 1023 lines and the resulting output looks like:

enter image description here

It is unclear to me why changing the threshold does not do anything, but I am sure I am misunderstanding something. Anyone have any ideas on how to best do this?


Solution

  • I just made some improvements to the Hough transform line detector in the framework that will help with this, but you're going to need to do some additional preprocessing to your image to pick out just that blue box.

    Let me explain how this operation works. First, it detects edges in an image. For each pixel determined to be an edge (right now, I'm using a Canny edge detector for this), the coordinate of that pixel is extracted. Each of those coordinates is then used to draw a pair of lines in parallel coordinate space (based on the process described in "Real-Time Detection of Lines using Parallel Coordinates and OpenGL" by Dubská, et al.).

    Pixels in parallel coordinate space where lines intersect will increase in intensity. The points of greatest intensity in parallel coordinate space indicate the presence of a line the real world scene.

    However, only the pixels that are local maxima for intensity indicate real lines. The challenge is in determining local maxima to suppress noise from busy scenes. That's what I haven't totally solved in this operation. In your image above, the huge number of lines is due to a mess of points being above the detection threshold in parallel coordinate space, but not being properly removed for not being local maxima.

    I did make some improvements, though, so I am getting a cleaner output from the operation now (I just did this quickly off a live video feed of my screen):

    enter image description here

    I fixed a bug in the local non-maximum suppression filter and expanded the area it works over from 3x3 to 5x5. It's still leaving behind a bunch of non-maximum points which contribute to noise, but it's much better.

    You'll notice this still doesn't quite do what you want. It's picking up lines in the text, but not your box. That's because the black text on a white background produces very strong, very sharp edges at the edge detection stage, but the light blue selection box on a white background needs an extremely low threshold to even be picked up in any edge detection process.

    If you're always going to be picking out a blue selection box, what I'd recommend is that you run a preprocessing operation to uniquely identify blue objects in the scene. A simple way to do this would be to define a custom filter that subtracts the red component from the blue for each pixel, flooring negative values and taking the result of that calculation as the output for the red, green, and blue channels. You might even want to multiply the result by 2.0-3.0 to intensify this difference.

    The result of that should be an image where blue areas in your image show as white and everywhere else as black. That'll greatly improve the contrast around your selection box and make it easier to pick out from the text. You'll need to experiment with the right parameters to get this to be as reliable as you want in your case.