Search code examples
pythonpytorchobject-detection

Create multiple images with different ratio PyTorch


I'm trying to perform some digit recognition using PyTorch. I have implemented a convolutional version of the sliding window with size 32x32. Which makes me able to identify digits of this range of size in a picture.

But now let's imagine I have an image of size 300x300 with a digit that occupies the whole image. I will never be able to identify it...

I have seen people saying that the image needs to be rescaled and resized. Meaning that I need to create various scaled versions of my initial image and then to feed my network with those "new" images.

Does anyone have any idea how I can perform that?

Here is a part of my code, if it can help..

# loading dataset
size=200
height=200
width= 300

transformer_svhn_test = transforms.Compose([
    transforms.Grayscale(3),
    transforms.Resize((height, width)),
    transforms.CenterCrop((size, size)),
    transforms.ToTensor(),
    transforms.Normalize([.5,.5,.5], [.5,.5,.5])
])

SVHN_test = SVHN_(train=False, transform=transformer_svhn_test)
SVHN_test_loader = DataLoader(SVHN_test, batch_size=batch_size, shuffle=False, num_workers=3)

#loading network
model = Network()
model.to(device)
model.load_state_dict(torch.load("digit_classifier_gray_scale_weighted.pth"))

# loading one image and feeding the model with it
image = next(iter(SVHN_test_loader))[0]
image_tensor = image.unsqueeze(0) # creating a single-image batch
image_tensor = image_tensor.to(device)

model.eval()
output = model(image_tensor)

Solution

  • Please correct me if I understand your question wrong:

    Your network takes images with size of 300x300 as input, and does 32x32 sliding window operation within your model, and output the locations of any digits in input images? In this setup, you are framing this problem as an object detection task.

    I am imaging the digits in your training data have sizes that are similar to 32x32, and you wanted to use multiple scale evaluation to make sure digits on your testing images will also have similar sizes as those in your training data. As for object detection network, the input size of your network is not fixed.

    So the thing you need is actually called multi scale evaluation/testing, and you will find it very common in Computer Vision tasks.

    A good starting point would be HERE