python google-cloud-platform hyperparameters gcp-ai-platform-training

How to force parameter dependency when using AI Platform hyper parameter tuning capability?

I have a scikit-learn model that I can train on GCP using the AI Platform training. I want to do hyper parameter tuning using also the AI Platform training. This is possible and just need to pass a YAML with the parameters and their ranges:

params:
- parameterName: max_df
  type: DOUBLE
  minValue: 0.0
  maxValue: 1.0
  scaleType: UNIT_LINEAR_SCALE
- parameterName: min_df
  type: DOUBLE
  minValue: 0
  maxValue: 1.
  scaleType: UNIT_LINEAR_SCALE

The issue here is that there is some dependency between the 2 parameters: min_df<max_df. If this is not the case scikit-learn will failed as expected.

It doesn't seems possible in the YAML to express such dependencies.

I can tune the number of failed trials but if I am unlucky and for the first job I have df_min>df_max then the full process of hyper parameter tuning will stop. It doesn't seems to be a valid option. link doc

I can control this inside my python code and ensure that df_min<df_max but what should I return to the code doing the hyper parameter tuning (using Bayesian Optimization I guess) so it understand that such selection of parameters is invalid ?

# this is for hyperparameter tuning
    hpt = hypertune.HyperTune()
    hpt.report_hyperparameter_tuning_metric(
        hyperparameter_metric_tag='accuracy',
        metric_value=accuracy,
        global_step=0)

Is just returning an accurancy of 0.0 good enought ? or Should I return None or NaN ?I didn't find any documentatiion on this topic.

Bonus question: When I am using YAML, I can only pass string and nothing like None or NULL link doc

- parameterName: FT_norm
      type: CATEGORICAL
      categoricalValues: ['l1', 'l2', 'None']

I need to convert 'None' to None directly in the python code before passing the value to the model. Is there a better way to handle such cases ? (I am using gcloud cli) For example using GCP python client library ?

Solution

At the end I implemeted the idea I described above to return a metric of 0.0 (accuray in my test) when the parameters given to sciki-learn are incorrect (like when we have df_min>df_max).

As you can see below there is no accruary reported when the value 0.0 was return in the case of invalid hyper parameters:

What also found is that the code only accept float or string as input for metric as below but I didn't find more documentation that explain this in details:

File "/root/.local/lib/python3.5/site-packages/hypertune/hypertune.py", line 62, in report_hyperparameter_tuning_metric
    metric_value = float(metric_value)
TypeError: float() argument must be a string or a number, not 'NoneType'

I am sure not this is 100% correct but seems to work as expected.