Search code examples
machine-learningdecision-tree

Are decision trees (e.g. C4.5) considered nonparametric learning?


I am relatively new to machine learning and am trying to place decision tree induction into the grand scheme of things. Are decision trees (for example, those built with C4.5 or ID3) considered parametric or nonparametric? I would guess that they may be indeed parametric because the decision split points for real values may be determined from some distribution of features values, for example the mean. However, they do not share the nonparametric characteristic of having to keep all the original training data (like one would do with kNN).


Solution

  • The term "parametric" refers to parameters that define the distribution of the data. Since decision trees such as C4.5 don't make an assumption regarding the distribution of the data, they are nonparametric. Gaussian Maximum Likelihood Classification (GMLC) is parametric because it assumes the data follow a multivariate Gaussian distribution (classes are characterized by means and covariances). With regard to your last sentence, retaining the training data (e.g., instance-based learning) is not common to all nonparametric classifiers. For example, artificial neural networks (ANN) are considered nonparametric but they do not retain the training data.