Applying Differential Privacy in TensorFlow Federated

I was trying to use Tensorflow Privacy with TFF following the two examples provided in here with my own dataset. I made sure that samples and target were formatted correctly and everything worked before adding the DP process with clipping and noise. Unfortunately, in any execution with dp enable the model diverge instead of converging, with both train and validation loss increasing at each round.

Round  0, 68.89s per round in average.
    Train: loss=5.172, accuracy=0.222
    Validation: loss=6.181, accuracy=0.002

Round  1, 61.52s per round in average.
    Train: loss=4.087, accuracy=0.328
    Validation: loss=6.747, accuracy=0.002

Round  2, 57.98s per round in average.
    Train: loss=4.659, accuracy=0.227
    Validation: loss=7.475, accuracy=0.002

Round  3, 56.62s per round in average.
    Train: loss=5.354, accuracy=0.198
    Validation: loss=8.409, accuracy=0.002
     Updating the best state...

Round  4, 55.25s per round in average.
    Train: loss=6.181, accuracy=0.172
    Validation: loss=9.330, accuracy=0.004

Round  5, 54.36s per round in average.
    Train: loss=7.739, accuracy=0.095
    Validation: loss=10.311, accuracy=0.006

Round  6, 53.83s per round in average.
    Train: loss=9.188, accuracy=0.037
    Validation: loss=11.243, accuracy=0.006

Round  7, 53.63s per round in average.
    Train: loss=9.581, accuracy=0.080
    Validation: loss=12.214, accuracy=0.009

I have tried with different combinations of clip and noise_multiplier but without achieving any results.. Here is an example:

  'clients_per_round' : 20,
  'client_epochs_per_round' : 2,
  'uniform_weighting' : True,
  'server_optimizer': 'adam',
  'client_optimizer': 'adam',

  'clip':0.05, #l2 norm
  'noise_multiplier' : 1.0,
  'adaptive_clip_learning_rate' : 0,
  'target_unclipped_quantile' : 0.5,
  'clipped_count_budget_allocation' : 0.1,
  'per_vector_clipping' : False,

Any idea on what could be the problem? With 'noise_multiplier' : False everything was working properly.. The definition of the DP_query and the averaging process is basically the same used in the example:

dp_query = tff.utils.build_dp_query(
      clip=FLAGS.clip,
      noise_multiplier=FLAGS.noise_multiplier,
      expected_total_weight=FLAGS.clients_per_round,
      adaptive_clip_learning_rate=FLAGS.adaptive_clip_learning_rate,
      target_unclipped_quantile=FLAGS.target_unclipped_quantile,
      clipped_count_budget_allocation=FLAGS.clipped_count_budget_allocation,
      expected_clients_per_round=FLAGS.clients_per_round,
      per_vector_clipping=FLAGS.per_vector_clipping,
      model=model_fn())

  weights_type = tff.learning.framework.weights_type_from_model(model_fn)
  aggregation_process = tff.utils.build_dp_aggregate_process(
      weights_type.trainable, dp_query)

Thank you!

Solution

Your noise_multiplier is too high for your number of clients_per_round. Following the methodology in "Learning Differentially Private Language Models", you should first find the largest n_m that allows training with good utility, then scale up n_m and proportionally scale up c_p_r to train a final model with good privacy.