algorithm image image-processing computer-vision sprite

How to automatically detect and crop individual sprite bounds in sprite sheet?

Given a sprite sheet like this:

Sprite Sheet Example

I would like to write an algorithm that can loop through the pixel data and determine the bounding rectangle of each discreet sprite.

If we assume that for each pixel X, Y that I can pull either true (pixel is not totally transparent) or false (pixel is totally transparent), how would I go about automatically generating the bounding rectangles for each sprite?

The resulting data should be an array of rectangle objects with {x, y, width, height}.

Here's the same image but with the bounds of the first four sprites marked in light blue:

Sprite Sheet With Bounds

Can anyone give a step-by-step on how to detect these bounds as described above?

Solution

Here's an approach

Convert image to grayscale
Otsu's threshold to obtain binary image
Perform morphological transformations to smooth image
Find contours
Iterate through contours to draw bounding rectangle and extract ROI

After converting to grayscale, we Otsu's threshold to obtain a binary image

Next we perform morphological transformations to merge each sprite into a single contour

From here we find contours, iterate through each contour, draw the bounding rectangle, and extract each ROI. Here's the result

and here's each saved sprite ROI

I've implemented this method using OpenCV and Python but you can adapt the strategy to any language

import cv2

image = cv2.imread('1.jpg')
original = image.copy()
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)[1]

kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (3,3))
close = cv2.morphologyEx(thresh, cv2.MORPH_CLOSE, kernel, iterations=2)
dilate = cv2.dilate(close, kernel, iterations=1)

cnts = cv2.findContours(dilate, cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE)
cnts = cnts[0] if len(cnts) == 2 else cnts[1]

sprite_number = 0
for c in cnts:
    x,y,w,h = cv2.boundingRect(c)
    ROI = image[y:y+h, x:x+w]
    cv2.imwrite('sprite_{}.png'.format(sprite_number), ROI)
    cv2.rectangle(image, (x, y), (x + w, y + h), (36,255,12), 2)
    sprite_number += 1

cv2.imshow('thresh', thresh)
cv2.imshow('dilate', dilate)
cv2.imshow('image', image)
cv2.waitKey()