Search code examples
pythontensorflowtensorflow-datasetstensorflow2.0

How can I merge two (or more) TensorFlow datasets?


I have fetched the CelebA datasets with 3 partitions as follows

>>> celeba_bldr = tfds.builder('celeb_a')
>>> datasets = celeba_bldr.as_dataset()
>>> datasets.keys()
dict_keys(['test', 'train', 'validation'])

ds_train = datasets['train']
ds_test = datasets['test']
ds_valid = datasets['validation']

Now, I want to merged them all into one dataset. For example, I would need to combine the train and validaiton together, or possibly, merge all of them together and then split them based on different subject-disjoint criterion of my own. Is there anyway to do that?

I could not find any option to do this in the docs https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset


Solution

  • Looking at the docs you linked, dataset seems to have concatenate method, so I'd presume you can get a joint dataset as:

    ds_train = datasets['train']
    ds_test = datasets['test']
    ds_valid = datasets['validation']
    
    ds = ds_train.concatenate(ds_test).concatenate(ds_valid)
    

    See: https://www.tensorflow.org/versions/r2.0/api_docs/python/tf/data/Dataset#concatenate