I have to use this code:
val dt = new DecisionTreeClassifier().setLabelCol("indexedLabel").setFeaturesCol("indexedFeatures").setImpurity(impurity).setMaxBins(maxBins).setMaxDepth(maxDepth);
I need to add categorical features information so that the decision tree doesn't treat the indexedCategoricalFeatures
as numerical. I have this map:
val categoricalFeaturesInfo = Map(143 -> 126, 144 -> 5, 145 -> 216, 146 -> 100, 147 -> 14, 148 -> 8, 149 -> 19, 150 -> 7);
However it only works with DecisionTree.trainClassifier
method. I can't use this method because it accepts different arguments than the one I have... I would really want to be able to use the DecisionTreeClassifie
r with categorical features treated properly.
Thank your for your help!
You're mixing two different APIs which take different approach to categorical data:
RDD
based o.a.s.mllib
which provides required metadata by passing categoricalFeaturesInfo
map.Dataset
(DataFrame
) o.a.s.ml
which is using column metadata to determine variable types. If you correctly use ML
transformers to create features this should be handled automatically for you, otherwise you'll have to provide metadata manually.