I have a dataset with 200+ categorical variables (non-ordinal) and just a few continuous variables. I have tried to use one-hot encoding but that increases the dimensions by a lot and results in a poor score.
It seems like the regular scikit-learn tree can only be used with categorical variables that has been transformed into one-hot encoding (for non-ordinal vars) and I was if there's a way to create a tree without one-hot. I did some research and found that there's an API called h2o that might be useful, but I'm trying to find a way to run it on my local machine.
you can install the h2o-3 package for python, for example, from h2o.ai/downloads or from pypi.
the h2o package handles categorical values automatically efficiently. it is recommended to not one-hot-encode them first.
you can find lots of documentation at docs.h2o.ai.