Search code examples
pythontensorflowtensorflow-datasets

What is the difference between Dataset.from_tensors and Dataset.from_tensor_slices?


I have a dataset represented as a NumPy matrix of shape (num_features, num_examples) and I wish to convert it to TensorFlow type tf.Dataset.

I am struggling trying to understand the difference between these two methods: Dataset.from_tensors and Dataset.from_tensor_slices. What is the right one and why?

TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices the tensor should have same size in the 0-th dimension.


Solution

  • from_tensors combines the input and returns a dataset with a single element:

    >>> t = tf.constant([[1, 2], [3, 4]])
    >>> ds = tf.data.Dataset.from_tensors(t)
    >>> [x for x in ds]
    [<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
     array([[1, 2],
            [3, 4]], dtype=int32)>]
    

    from_tensor_slices creates a dataset with a separate element for each row of the input tensor:

    >>> t = tf.constant([[1, 2], [3, 4]])
    >>> ds = tf.data.Dataset.from_tensor_slices(t)
    >>> [x for x in ds]
    [<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,
     <tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]