I'm performing few experiments with TFF
. In this one, I would like to sample the participating clients at each training around according to poisson subsampling
where each client is sampled with a probability of p = users_per_round / num_users
At each round, poisson subsampling
is performed until the list sampled_ids
is filled with unique ids equal to the number of users_per_round
.
total_rounds = 100
num_users = 500
users_per_round = 150
lambda_value = np.random.rand()
for round_num in range(total_rounds):
sampled_ids = []
while len(sampled_ids) < users_per_round:
subsampling = np.random.poisson(lambda_value, num_users)
whether = subsampling > 1 - users_per_round / num_users
for i in np.arange(num_users):
if whether[i] and len(sampled_ids) < users_per_round and i
not in sampled_ids:
sampled_ids.append(i)
sampled_clients = [train_data.client_ids[i] for i in sampled_ids]
sampled_train_data =
[train_data.create_tf_dataset_for_client(client) for client in
sampled_clients]
server_state, train_metrics = iterative_process.next(server_state,
sampled_train_data)
Is there a better way of performing poisson subsampling
, especially if the subsampling is applied in differentially private FL
, so that the RDP accountant
yields accurate privacy analysis results ?
What would be the best strategy to set the value of lambda
other than random
values ?
Poisson subsampling means each user is included with probability q
. The number of users in each round you get from this process is approximately Poisson distributed if q
is small. If you want sample such that on expectation you have users_per_round
users in a round you could do the following:
users_this_round = np.random.poisson(users_per_round)
sampled_ids = np.random.choice(num_users, size=users_this_round, replace=False)
If you want to choose exactly users_per_round
users (which is technically not Poisson subsampling) you could do this:
sampled_ids = np.random.choice(num_users, size=users_per_round, replace=False)