Federating learning process with poisson subsampling of participants

I'm performing few experiments with TFF. In this one, I would like to sample the participating clients at each training around according to poisson subsampling where each client is sampled with a probability of p = users_per_round / num_users

At each round, poisson subsampling is performed until the list sampled_ids is filled with unique ids equal to the number of users_per_round.

total_rounds = 100
num_users = 500
users_per_round = 150
lambda_value = np.random.rand()

for round_num in range(total_rounds):

   sampled_ids = []

   while len(sampled_ids) < users_per_round:

      subsampling = np.random.poisson(lambda_value, num_users)
      whether = subsampling > 1 - users_per_round / num_users
      for i in np.arange(num_users):
          if whether[i] and len(sampled_ids) < users_per_round and i 
             not in sampled_ids:
                  sampled_ids.append(i)


  sampled_clients = [train_data.client_ids[i] for i in sampled_ids]

  sampled_train_data = 
     [train_data.create_tf_dataset_for_client(client) for client in 
         sampled_clients]

  server_state, train_metrics = iterative_process.next(server_state, 
                                                 sampled_train_data)

Is there a better way of performing poisson subsampling, especially if the subsampling is applied in differentially private FL, so that the RDP accountant yields accurate privacy analysis results ?

What would be the best strategy to set the value of lambda other than random values ?

Solution

Poisson subsampling means each user is included with probability q. The number of users in each round you get from this process is approximately Poisson distributed if q is small. If you want sample such that on expectation you have users_per_round users in a round you could do the following:

users_this_round = np.random.poisson(users_per_round)
sampled_ids = np.random.choice(num_users, size=users_this_round, replace=False)

If you want to choose exactly users_per_round users (which is technically not Poisson subsampling) you could do this:

sampled_ids = np.random.choice(num_users, size=users_per_round, replace=False)