Search code examples

Evaluation of model tensorflow federated

I am following this tutorial for image classification using tff. The only difference is that i am using 3d images of size 128x128x3.
During the training process i evaluate the model in each training round as seen below:

for round_num in range(0, NUM_ROUNDS):
  state, train_metrics =, federated_train_data)
  metrics = eval(state.model, federated_test_data)

  print(' TRAINING round {:2d}, metrics={}, loss={}'.format(round_num, train_metrics['train']['binary_accuracy'],train_metrics['train']['loss']))
  print(' TESTING round {:2d}, metrics={}, loss={}'.format(round_num, metrics['eval']['binary_accuracy'],metrics['eval']['loss']))

where :

eval = tff.learning.build_federated_evaluation(model_fn)

In such a case i get results like the ones below :

TRAINING round  0, metrics=0.4614659249782562, loss=1.136414647102356
 TESTING round  0, metrics=0.25431033968925476, loss=1.3862652778625488
 TRAINING round  1, metrics=0.5336836576461792, loss=1.0317342281341553
 TESTING round  1, metrics=0.25538793206214905, loss=1.3862823247909546
 TRAINING round  2, metrics=0.6359471678733826, loss=0.8686623573303223
 TESTING round  2, metrics=0.25538793206214905, loss=1.3865610361099243
 TRAINING round  3, metrics=0.6370250582695007, loss=0.8408811092376709
 TESTING round  3, metrics=0.25646552443504333, loss=1.3872325420379639
 TRAINING round  4, metrics=0.7109943628311157, loss=0.6903313994407654
 TESTING round  4, metrics=0.25538793206214905, loss=1.3889813423156738
 TRAINING round  5, metrics=0.7504715919494629, loss=0.6067320704460144
 TESTING round  5, metrics=0.25538793206214905, loss=1.3922673463821411
 TRAINING round  6, metrics=0.7718943953514099, loss=0.540172815322876
 TESTING round  6, metrics=0.25646552443504333, loss=1.3983343839645386

We can clearly see that the model is learning producing better training accuracy but the validation is frozen. What i suspect is that somehow only one class is predicted ( my problem has 4 classes). The weird thing is this :
If i change federated_test_data to federated_train_data in eval() i still get the same evaluation results but i can clearly see that for the training data the results are different. Any ideas about this?
Is tff doing internally any preprocessing for the evaluation step?


  • state.model does not work for me. The following gives me your desired output:

    eval = tff.learning.build_federated_evaluation(model_fn)
    for round_num in range(0, NUM_ROUNDS):
      state, train_metrics =, federated_train_data)
      metrics = eval(iterative_process.get_model_weights(state), federated_test_data)
      print(' TRAINING round {:2d}, metrics={}, loss={}'.format(round_num, train_metrics['client_work']['train']['binary_accuracy'],train_metrics['client_work']['train']['loss']))
      print(' TESTING round {:2d}, metrics={}, loss={}'.format(round_num, metrics['eval']['binary_accuracy'],metrics['eval']['loss']))