I try to set up my GBDTLRClassifier following the instruction here. First, I have done label encode on my columns. Then I define my categorical and continuous features, putting column names in two list.
cat # categorical column names
conts # continuous column names
gbm = lgb.LGBMClassifier(n_estimator = 90)
classifier = GBDTLRClassifier(gbm, LogisticRegression(penalty='l2'))
dm = DataFrameMapper([([cat_col], CategoricalDomain()) for cat_col in cat] + [(conts, ContinuousDomain())])
pipeline = PMMLPipeline([('mapper', dm), ('classifier', classifier)])
pipeline.fit(df[cat + conts], df['y'], classifier__gbdt__eval_set=[(val[cat + conts], val['y'])], classifier__gbdt__early_stopping_rounds = 5, classifier__gbdt__categorical_feature=cat)
pp = make_pmml_pipeline(pipelin, target_fields=['y'])
sklearn2pmml(pp, '/tmp/lgb+lr.pmml')
I get error message in fitting:TypeError: Wrong type(str) or unknown name(root) in categorical_feature
. While root
is definitely in cat
. Looks like lgbm not aware of which columns are categorical, which is confusing.
Moreover, when I remove the mapper part, no fitting error but convert failed in making pmml file with message: transformer object of the first step does not specify the number of input features
Does anyone could tell how to make this procedure work. THx
Based on comment here, need to set feature_name
when I send string column names into categorical_feature
. A little tricky here.