I have a scikit-learn
model that I can train on GCP
using the AI Platform training
. I want to do hyper parameter tuning using also the AI Platform training
. This is possible and just need to pass a YAML with the parameters and their ranges:
params:
- parameterName: max_df
type: DOUBLE
minValue: 0.0
maxValue: 1.0
scaleType: UNIT_LINEAR_SCALE
- parameterName: min_df
type: DOUBLE
minValue: 0
maxValue: 1.
scaleType: UNIT_LINEAR_SCALE
The issue here is that there is some dependency between the 2 parameters: min_df<max_df
. If this is not the case scikit-learn
will failed as expected.
It doesn't seems possible in the YAML to express such dependencies.
I can tune the number of failed trials but if I am unlucky and for the first job I have df_min>df_max
then the full process of hyper parameter tuning will stop. It doesn't seems to be a valid option.
link doc
I can control this inside my python code and ensure that df_min<df_max
but what should I return to the code doing the hyper parameter tuning (using Bayesian Optimization I guess) so it understand that such selection of parameters is invalid ?
# this is for hyperparameter tuning
hpt = hypertune.HyperTune()
hpt.report_hyperparameter_tuning_metric(
hyperparameter_metric_tag='accuracy',
metric_value=accuracy,
global_step=0)
Is just returning an accurancy of 0.0 good enought ? or Should I return None
or NaN
?I didn't find any documentatiion on this topic.
Bonus question: When I am using YAML, I can only pass string and nothing like None or NULL link doc
- parameterName: FT_norm
type: CATEGORICAL
categoricalValues: ['l1', 'l2', 'None']
I need to convert 'None'
to None
directly in the python code before passing the value to the model. Is there a better way to handle such cases ? (I am using gcloud cli) For example using GCP python client library ?
At the end I implemeted the idea I described above to return a metric of 0.0 (accuray in my test) when the parameters given to sciki-learn are incorrect (like when we have df_min>df_max
).
As you can see below there is no accruary reported when the value 0.0 was return in the case of invalid hyper parameters:
What also found is that the code only accept float or string as input for metric as below but I didn't find more documentation that explain this in details:
File "/root/.local/lib/python3.5/site-packages/hypertune/hypertune.py", line 62, in report_hyperparameter_tuning_metric
metric_value = float(metric_value)
TypeError: float() argument must be a string or a number, not 'NoneType'
I am sure not this is 100% correct but seems to work as expected.