Search code examples
pythontensorflowmachine-learningmnistfederated-learning

Accuracy decreasing after iteration in federated learning setting


I am working on a federated learning to detect bad clients.

Brief about federated learning - Data is divided into various clients, training is done on client side and the results are then sent by each client to central server where aggregation of the client weights is done and the aggregated model is then again sent to local clients for training.

I am working on detection of client sending malicious updates to central server. I am using base code present here.

I wrote a method filter client which will detect if some client is malicious and remove that client from aggregation step. I expected that there will not be much performance difference if one of the client weight is remove from global aggregation but the results are confusing me. I added this piece of code. noisy_client[itr] != 0 will only occur for 1/10 clients and it will occur for the same client in each iteration.

if noisy_client[itr] == 0:
            scaled_local_weight_list.append(scaled_weights)
        

If this code is not used then the accuracy in each iteration is increasing steadily

0.6102380952380952
0.7195238095238096
0.7723809523809524
0.8014285714285714
0.8195238095238095
0.8314285714285714
0.8397619047619047
0.8438095238095238
0.8516666666666667
0.8545238095238096
0.8573809523809524
0.8602380952380952
0.861904761904762
0.8635714285714285
0.8654761904761905
0.8671428571428571
0.8683333333333333

But when the code is used accuracy increases for first few iterations and decreases after that for each iteration

0.6883333333333334 0.7373809523809524 0.7552380952380953 0.765 0.763095238095238 0.7559523809523809 0.7497619047619047 0.7414285714285714 0.7323809523809524 0.7221428571428572 0.7154761904761905 0.705952380952381 0.6966666666666667 0.6895238095238095 0.6819047619047619 0.6730952380952381 0.6597619047619048 0.6102380952380952

I have tried reducing the learning rate from 0.01 to 0.001 and also decreasing the batch size but saw the same behavior after that. What can be the reason for this and how this can be corrected ?


Solution

  • A common problem could be that you are trying to aggregate in a no_grad() scope. Happened to me once. The optimizer was essentially resetting once every federated round even though the models are being aggregated.

    This is a hunch as I can't say more since I haven't seen any code.