machine-learning decision-tree prediction backpropagation gradient-descent

What is the difference between a CNN and a Decision Tree/Forest

I was reading a paper when the autor J. Welbl of this paper http://hci.iwr.uni-heidelberg.de/publications/mip/techrep/welbl_14_casting.pdf suggested to merge convolutional neural networks with decision trees. Here a picture taken from the paper mentioned above:

That made me think and I couldn't answer what is the difference between those two methodes. Since in CNNs Backpropagation & Gradient Descent is used to train the decisions, which could easily be applied to decision trees too and hence the output should be the same ?

Am I right or gone totaly wrong ?

Solution

In fact decision trees and CNN have nearly nothing in common. These models are completely different in the way they are built (in particular you do not train DT through gradient descent, they cannot represent linear relations between features, and so on...), trained and in general characteristics. You can simply "convert" DT to a neural network (but not the other way around!), but you can do so with (nearly) any model, and it does not mean that everything is a neural network. This only shows how vague are neural networks in general.

Now more details. First, paper is talking about ANN (artificial neural networks), not CNN (convolutional neural networks). Second - ANN is so general setting in terms of computability that you can express every non-looping/recurrent computation as ANN. Furthermore, once you go into recurrent networks you can actually show that they are Turing complete (thus every algorithm can be represented as RNN). The only thing missing is why. And usualy there is no point in doing so. Authors here simply claims, that they can further fine tune RF represented as ANN, and to be honest the results provided are barely convincing (by forgetting about RF as such you loose it extreme simplicity, ease of parallelization, small number of hyperparameters, ease of use, and so on). In particular - you can fine tune RF in hundreads of ways, and obviously by fitting next hyperparameter (which they do) and validate on it - you will improve the scores. But there is no depth in what is shown here (and for sure no suggestion that RF are neural networks :-)), as I said - you can think about any non-looping/recurrent model as ANN, yet it does not mean that a particular model (such as DT) is ANN.