I've written some image analysis software that can determine the basic shape, color, and dimensions of what it considers to be the most dominant object in the image.
I've also created a database of objects for the algorithm to choose from:
Item | Shape | Colors | Width range | Height range Box | rectangle | brown, black, white | 20-50 cm | 10-30 cm Basketball | circle | orange | 20-25cm | 20-25 cm Backpack | rectangle | black | 40-50 cm | 20-30 cm . . . etc.
An example would be where the system detects a black rectangle that is 42cm wide and 26cm in height. In this case, both 'box' and 'backpack' would qualify as correct answers. Are there any good ways to make an educated guess as to which of the two items it could be, such as 75% chance it's a backpack, 25% chance it's a box (possibly based on the fact that boxes have a chance of being 3 different colors and a wider range of sizes, as opposed to the backpack which could only be black)?
Other advice is also welcome. I'm having to teach myself about image recognition, so if there are other things I should be trying to pull out of an image, or a different way that I should be going about the database, those comments would also be greatly appreciated!
Apologies for the rather high-level description without much of a justification of why it works, but you can easily fill books answering that question and it's 1pm already, so I have to make it short:
Additionally to recording the range of acceptable sizes for boxes and backpacks, you need to define a probability distribution. Most likely you'd just go with a (2D) normal distribution, then you'd record the mean and a variation instead of the range. Do the same for the shape, color, etc. variables with a suitable probability distribution.
Then generate two data set with a few hundred data points like this:
p_1 = (shape=rectangle, color=black, width=12, height=34)
p_2 = (shape=circle, color=red, width=34, height=11)
...
For one of the sets, manually classify them as the object that would match the description best. That will become your verification set.
Take the other data set and train a classification algorithm like Fisher's linear discriminant using that data. You obtain a transformation T
that will maximize the "distance" between the classes (groups of data points representing an object) and minimize the "distance" between the points belonging to the same group.
When your program detects a new object with the properties
o = (shape=rectangle, color=black, width=42, height=26)
you apply the transformation obtained from Fisher's LD and measure the correlation (scalar vector product) to the transformations of the data points you classified as, i.e. calculate (T*o)*(T*p_backpack)'
and (T*o)*(T*p_box)'
which relate to the probability that the object o is actually a backpack/a box.