In the context of Multi-Class Classification (MCC) problem, a common approach is to build final solution from multiple binary classifiers. Two composition strategy typically mentioned are one-vs-all and one-vs-one.
In order to distinguish the approach, it is clearer to look at what each binary classifier attempt to do. One-vs-all's primitive classifier attempt to separate just one class from the rest. Whereas one-vs-one's primitive attempts to separate one against One-vs-one is also, quite confusingly, called all-vs-all and all-pairs.
I want to investigate this rather simple idea of building MCC classifier by composing binary classifier in binary-decision-tree-like manner. For an illustrative example:
has wings?
/ \
quack? nyan?
/ \ / \
duck bird cat dog
As you can see the has wings?
does a 2-vs-2 classification,
so I am calling the approach many-vs-many.
The problem is, I don't know where to start reading.
Is there a good paper you would recommend?
To give a bit more context, I'm considering using a multilevel evolutionary algorithm (MLEA) to build the tree. So if there is an even more direct answer, it would be most welcomed.
Edit: For more context (and perhaps you might find it useful), I read this paper which is one of the GECCO 2011 best paper winners; It uses MLEA to compose MCC in one-vs-all manner. This is what inspired me to look for a way to modify it as decision tree builder.
Sailesh's answer is correct in that what you intend to build is a decision tree. There are many algorithms already for learning such trees such as e.g. Random Forests. You could e.g. try weka and see what is available there.
If you're more interested in evolutionary algorithms, I want to mention Genetic Programming. You can try for example our implementation in HeuristicLab. It can deal with numeric classes and attempts to find a formula (tree) that maps each row to its respective class using e.g. mean squared error (MSE) as fitness function.
There are also instance-based classification methods like nearest neighbor or kernel-based methods like support vector machines. Instance-based method also support multiple classes, but with kernel-methods you have to use one of the approaches you mentioned.