tensorflow deep-learning siamese-network

Does one-shot-learning by definition allow to have only one training instance per class and would this be even feasible?

I am confused by the different interpretations of one-shot-learning found on the Internet. Wikipedia claims that "one-shot learning aims to classify objects from one, or only a few, examples". In this nicely elaborated blog post on one-shot-learning for assessing similarity of written characters, it says that "you do not require too many instances of a class and only few are enough to build a good model". Later on, however, the model is trained with 20 samples per character.

Let us say I have a dataset of 200 images of tomatoes, of which 100 are rotten and 100 are flawless, and I want to create a one-shot-learning approach using a siamese neural network in order to spot rotten ones in the test data. From my understanding of the Wikipedia definition, I would have to select one image of the rotten class and one image of the flawless class (both from the training data), and use only this pair in order to train the model over a certain number of iterations.

Is this even feasible? Since the model will see the same pair over and over again and nothing else, I am unsure if the model can really learn a robust feature representation of tomatoes based on which the similarity score is computed.
Is my understanding of the definition of one-shot-learning correct or did I miss something completely?
When testing, I intend to create pairs, in which one member A is always a flawless tomato. The other member B can be flawless or rotten. If the model outputs a low similarity, I then know that member B is rotten (according to the model's opinion). Is this practice correct?

Solution

In the strictest sense, one-shot learning means a machine learns from just one example of each thing. It needs to figure out the big picture from a small bit of information (Humans are good at this, but regular machine learning isn't always). In practice, the term "one-shot learning" can mean learning from a very small number of examples, not just one, causing potential confusion.

Training a Siamese network (link to short research paper), with only one picture per class (like, one rotten tomato, one perfect one) is a significant challenge. The network doesn't memorize the picture, it tries to learn to create a compact representation (like a 'fingerprint' or uuid, you could call it a feature summary) of distinguishing features.
How well this works depends on the complexity of the distinguishing features and the networks's generalization ability. True one-shot learning with just one example is super hard, and might not always work well, especially for complex things, so we cheat a little sometimes :).

You've got the right idea about one-shot learning, where very few examples are used, however, expecting robust performance from just one example per class may be too optimistic for complex tasks without extra help. The magic of one-shot learning hinges on the model's ability to learn generalizable features from limited samples.
Your testing idea, with pairs where one is always perfect tomato is like how Siamese networks do it. The model doesn't give a label like "rotten" or "good", it gives a score for how similar they are. Then you get to decide based on that score.

plus, how it works in the real world depends on how hard the problem is, the type of model, and what data you give it. And models are often pre-trained on larger datasets of similar data to give them a head start, as well as using variations of the same data (rotating, colour shifting, etc.) to boost the training set size.

I hope this answers your question, if not feel free to leave a comment :)

(offtopic, &emsp; and <br /> are a godsend for formatting)