For testing some code I want to be able to create a sklearn.tree._tree.Tree by hand, rather than by fitting to some data.
For concreteness let's say I want a tree that classifies points in the real line into intervals (-infinity, 5], (5,6] or (6,infinity). I want the tree shaped like
----0----
| |
| ---2---
| | |
1 3 4
where node 0 splits the real line at 5 and node 2 splits the real line at 6.
How to do this? I see that trees have a __setstate__
method, and looking at the output of __getstate__
it looks like I need something like
state = {
'n_features_': 1,
'max_depth': 2,
'node_count': 5,
'nodes': np.array([(1 , 2, 0, 5., 0.375, 3, 3.),
(-1, -1, 0, -2., 0. , 1, 1.),
(3 , 4, 0, 6., 0., , 2, 2.),
(-1, -1, 0, -2., 0., , 1, 1.),
(-1, -1, 0, -2., 0., , 1, 1.),
],
dtype=[('left_child', '<i8'), ('right_child', '<i8'), ('feature', '<i8'),('threshold', '<f8'), ('impurity', '<f8'), ('n_node_samples', '<i8'), ('weighted_n_node_samples', '<f8')]),
}
But I don't really understand what these parameters mean and in any case I don't see how to initialize a tree with this state in the first place.
After hours of trying to change by hand nodes. I found a solution. Indeed, you are right. By using the setstate you can do tree customization. The 'node' key must be as follows:
The -1 (for left/right child) & -2 (for feature) represents leafs.
When training a classifier, you'll have other another key: 'value'.