I was trying to use Tensorflow Privacy with TFF following the two examples provided in here with my own dataset. I made sure that samples and target were formatted correctly and everything worked before adding the DP process with clipping and noise. Unfortunately, in any execution with dp enable the model diverge instead of converging, with both train and validation loss increasing at each round.
Round 0, 68.89s per round in average.
Train: loss=5.172, accuracy=0.222
Validation: loss=6.181, accuracy=0.002
Round 1, 61.52s per round in average.
Train: loss=4.087, accuracy=0.328
Validation: loss=6.747, accuracy=0.002
Round 2, 57.98s per round in average.
Train: loss=4.659, accuracy=0.227
Validation: loss=7.475, accuracy=0.002
Round 3, 56.62s per round in average.
Train: loss=5.354, accuracy=0.198
Validation: loss=8.409, accuracy=0.002
Updating the best state...
Round 4, 55.25s per round in average.
Train: loss=6.181, accuracy=0.172
Validation: loss=9.330, accuracy=0.004
Round 5, 54.36s per round in average.
Train: loss=7.739, accuracy=0.095
Validation: loss=10.311, accuracy=0.006
Round 6, 53.83s per round in average.
Train: loss=9.188, accuracy=0.037
Validation: loss=11.243, accuracy=0.006
Round 7, 53.63s per round in average.
Train: loss=9.581, accuracy=0.080
Validation: loss=12.214, accuracy=0.009
I have tried with different combinations of clip and noise_multiplier but without achieving any results.. Here is an example:
'clients_per_round' : 20,
'client_epochs_per_round' : 2,
'uniform_weighting' : True,
'server_optimizer': 'adam',
'client_optimizer': 'adam',
'clip':0.05, #l2 norm
'noise_multiplier' : 1.0,
'adaptive_clip_learning_rate' : 0,
'target_unclipped_quantile' : 0.5,
'clipped_count_budget_allocation' : 0.1,
'per_vector_clipping' : False,
Any idea on what could be the problem? With 'noise_multiplier' : False everything was working properly.. The definition of the DP_query and the averaging process is basically the same used in the example:
dp_query = tff.utils.build_dp_query(
clip=FLAGS.clip,
noise_multiplier=FLAGS.noise_multiplier,
expected_total_weight=FLAGS.clients_per_round,
adaptive_clip_learning_rate=FLAGS.adaptive_clip_learning_rate,
target_unclipped_quantile=FLAGS.target_unclipped_quantile,
clipped_count_budget_allocation=FLAGS.clipped_count_budget_allocation,
expected_clients_per_round=FLAGS.clients_per_round,
per_vector_clipping=FLAGS.per_vector_clipping,
model=model_fn())
weights_type = tff.learning.framework.weights_type_from_model(model_fn)
aggregation_process = tff.utils.build_dp_aggregate_process(
weights_type.trainable, dp_query)
Thank you!
Your noise_multiplier is too high for your number of clients_per_round. Following the methodology in "Learning Differentially Private Language Models", you should first find the largest n_m that allows training with good utility, then scale up n_m and proportionally scale up c_p_r to train a final model with good privacy.