image machine-learning classification keras multilabel-classification

Machine Learning: Image classification into 3 classes (Dog or Cat or Neither) using Convolutional NN

I would appreciate a bit of help in thinking this through. I have a classifier that can categorize the images into either dog or cat successfully with good accuracy. I have a good data set to train the classifier on. So far no problem.

I have about 20,000 dog and 20,000 cat images.

However, when I try to present other images like a car or a building or a tiger that do not have either dog or cat, I would like the output of the classifier to be "Neither". Right now obviously, the classifier tries to classify everything into a Dog or Cat which is not correct.

Question 1:

How can I achieve this? Do I need to have a third set of images that does not contain dogs or cats and train the classifier on these additional images to recognize everything else as "Neither"?

At a high level approximately, How many images of the non Dog/Cat category would I need to get good accuracy? Would about 50,000 images do since the non dog/cat images domain is so huge? or do I need even more images?

Question 2:

Instead of training my own classifier using my own image data, can I use Imagenet trained VGG16 Keras model for the initial layer and add the DOG/CAT/Neither classifier on top as the Fully connected layer?

See this example to load a pre-trained imagenet model

Thanks much for your help.

Solution

Question 2

I'll take the "killer" heuristic first. Yes, use the existing trained model. Simply conglomerate all of the dog classifications into your class 1, the cats into class 2, and everything else into class 0. This will solve virtually all of your problem.

Question 1

The problem is that your initial model has been trained that everything in the world (all 40,000 images) is either a dog or a cat. Yes, you have to train a third set, unless your training method is a self-limiting algorithm, such as a single-class SVM (run once on each classification). Even then, I expect that you'd have some trouble excluding a lynx or a wolf.

You're quite right that you'll need plenty of examples for the "neither" class, given the high dimension of the input space: it's not so much the quantity of images, but their placement just "over the boundary" from a cat or dog. I'd be interested in a project to determine how to do this with minimal additional input.

In short, don't simply grab 50K images from the ImageNet type of the world; choose those that will give your model the best discrimination: other feline and canine examples, other objects you find in similar environments (end table, field rodent, etc.).