Search code examples
opencvmachine-learningneural-networkclassificationencog

Image Classification example in ENCOG(or any Framework)?


I need classify images from a video camera, the main features to consider are:

  • Object Form (basic shape like triangle, square, etc)
  • Object Color
  • Few Deformations

I'm already working in shape recognition with opencv, following this Real Time Tracking Tutorial and this:

My goal is, if I show a tiny or big square shape in front of camera, then it would recognized it as a square of color '....', if I show a eared/deformed paper(square or triangle) then it would recognized this shape as a triangle of color '....'.

I'm searching how to do Image Classification with Encog, but what I found was classification using quantitative attributes like, measure (lenght, width) not by shape form.

The encog example are this (available in Pluralsight).

In this encog example the training data are like:

Sepal Length    Sepal Width Petal Length    Petal Width Species
5.1             3.5         1.4             0.2         setosa
4.9             3.0         1.4             0.2         setosa
4.7             3.2         1.3             0.2         setosa
7.0             3.2         4.7             1.4         versicolor
6.4             3.2         4.5             1.5         versicolor
6.9             3.1         4.9             1.5         versicolor
6.3             3.3         6.0             2.5         virginica
5.8             2.7         5.1             1.9         virginica
7.1             3.0         5.9             2.1         virginica

In my case the training data would be pixel (mat type of encog) also my evaluation data.

How to normalize pixel for encog training data?

I need some clue, tutorial. Many thanks.


Solution

  • Short answer: From purely technical perspective you need to subsample images to about 100x100 pixels, convert to grayscale (for shape recognition), get all pixels to a single vector and normalize largest integer pixels value to 1.0 (for example if you pixel values are in range [0..255] you divide everything by 255). For color images one usually creates three vectors, one for each channel (RGB), normalize them in the same way, concatenate and feed into neural network (MLP) classifier with at least one hidden layer. Thats all very similar to simple example you provide, only uses lot more data.

    Long answer: above is probably the best thing you can do with Encog and given enough samples and suffucient CPU/GPU resources, this should work for your task. However, image recognition is currently open problem and no single universal method exists that will solve everything. Most work is done nowdays with convolutional neural nets (not supported by Encog), and there are certain number of important things to consider, so you might want to read some classic image recogition papers to get some important ideas. In case you need help with theory, I think its best to ask your question here as theory like that and tutorials are to my best knowledge outside the scope of SO