deep-learning pytorch reinforcement-learning stable-baselines wandb

Hyperparameter Tuning with Wandb Sweep for custom parameters

I'm trying to tune the hyperparameters using the Stable-Baseline-3 Library for the network architecture.

My configuration file is:

program: main.py
method: bayes
name: sweep
metric:
  goal: minimize
  name: train/loss
parameters:
  batch_size:
    values: [16, 32, 64, 128, 256, 512, 1024]
  epochs:
    values: [20, 50, 100, 200, 250, 300]
  lr:
    max: 0.1
    min: 0.000001

But if I try to add to the parameters:

  policy_kwargs:
    net_arch:
      pi:
        values: [[ 128, 128 ],[ 256, 256 ],[ 512, 512 ]]
      vf:
        values: [[ 128, 128 ],[ 256, 256 ],[ 512, 512 ]]

I got the following error:

wandb.errors.CommError: Invalid sweep config: invalid hyperparameter configuration: policy_kwargs

Is it possible to use wandb sweep with Stable-Baseline-3 for the network architecture?

Solution

You are trying to create a nested config. Please refer to this documentation here.

Your configuration should be:

program: main.py
method: bayes
name: sweep
metric:
  goal: minimize
  name: train/loss
parameters:
  batch_size:
    values: [16, 32, 64, 128, 256, 512, 1024]
  epochs:
    values: [20, 50, 100, 200, 250, 300]
  lr:
    max: 0.1
    min: 0.000001
  policy_kwargs:
    parameters:
        net_arch:
            parameters:
                pi:
                    values: [[ 128, 128 ],[ 256, 256 ],[ 512, 512 ]]
                vf:
                    values: [[ 128, 128 ],[ 256, 256 ],[ 512, 512 ]]