I am working on a project with Tensorflow federated. I have managed to use the libraries provided by TensorFlow Federated Learning simulations in order to load, train, and test some datasets.
For example, i load the emnist dataset
emnist_train, emnist_test = tff.simulation.datasets.emnist.load_data()
and it got the data sets returned by load_data() as instances of tff.simulation.ClientData. This is an interface that allows me to iterate over client ids and allow me to select subsets of the data for simulations.
len(emnist_train.client_ids)
3383
emnist_train.element_type_structure
OrderedDict([('pixels', TensorSpec(shape=(28, 28), dtype=tf.float32, name=None)), ('label', TensorSpec(shape=(), dtype=tf.int32, name=None))])
example_dataset = emnist_train.create_tf_dataset_for_client(
emnist_train.client_ids[0])
I am trying to load the fashion_mnist dataset with Keras to perform some federated operations:
fashion_train,fashion_test=tf.keras.datasets.fashion_mnist.load_data()
but I get this error
AttributeError: 'tuple' object has no attribute 'element_spec'
because Keras returns a Tuple of Numpy arrays instead of a tff.simulation.ClientData like before:
def tff_model_fn() -> tff.learning.Model:
return tff.learning.from_keras_model(
keras_model=factory.retrieve_model(True),
input_spec=fashion_test.element_spec,
loss=loss_builder(),
metrics=metrics_builder())
iterative_process = tff.learning.build_federated_averaging_process(
tff_model_fn, Parameters.server_adam_optimizer_fn, Parameters.client_adam_optimizer_fn)
server_state = iterative_process.initialize()
To sum up,
Is any way to create tuple elements of tff.simulation.ClientData
from Keras Tuple Numpy arrays?
Another solution that comes to my mind is to use the
tff.simulation.HDF5ClientData
and load
manually the appropriate files in aHDF5
format (train.h5, test.h5)
in order to get the tff.simulation.ClientData
, but my problem is that i cant find the url for fashion_mnist HDF5
file format i mean something like that for both train and test:
fileprefix = 'fed_emnist_digitsonly'
sha256 = '55333deb8546765427c385710ca5e7301e16f4ed8b60c1dc5ae224b42bd5b14b'
filename = fileprefix + '.tar.bz2'
path = tf.keras.utils.get_file(
filename,
origin='https://storage.googleapis.com/tff-datasets-public/' + filename,
file_hash=sha256,
hash_algorithm='sha256',
extract=True,
archive_format='tar',
cache_dir=cache_dir)
dir_path = os.path.dirname(path)
train_client_data = hdf5_client_data.HDF5ClientData(
os.path.join(dir_path, fileprefix + '_train.h5'))
test_client_data = hdf5_client_data.HDF5ClientData(
os.path.join(dir_path, fileprefix + '_test.h5'))
return train_client_data, test_client_data
My final goal is to make the fashion_mnist dataset work with the TensorFlow federated learning.
You're on the right track. To recap: the datasets returned by tff.simulation.dataset
APIs are tff.simulation.ClientData
objects. The object returned by tf.keras.datasets.fashion_mnist.load_data
is a tuple
of numpy arrays.
So what is needed is to implement a tff.simulation.ClientData
to wrap the dataset returned by tf.keras.datasets.fashion_mnist.load_data
. Some previous questions about implementing ClientData
objects:
This does require answering an important question: how should the Fashion MNIST data be split into individual users? The dataset doesn't include features that that could be used for partitioning. Researchers have come up with a few ways to synthetically partition the data, e.g. randomly sampling some labels for each participant, but this will have a great effect on model training and is useful to invest some thought here.