I have a dataset represented as a NumPy matrix of shape (num_features, num_examples)
and I wish to convert it to TensorFlow type tf.Dataset
.
I am struggling trying to understand the difference between these two methods: Dataset.from_tensors
and Dataset.from_tensor_slices
. What is the right one and why?
TensorFlow documentation (link) says that both method accept a nested structure of tensor although when using from_tensor_slices
the tensor should have same size in the 0-th dimension.
from_tensors
combines the input and returns a dataset with a single element:
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensors(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2, 2), dtype=int32, numpy=
array([[1, 2],
[3, 4]], dtype=int32)>]
from_tensor_slices
creates a dataset with a separate element for each row of the input tensor:
>>> t = tf.constant([[1, 2], [3, 4]])
>>> ds = tf.data.Dataset.from_tensor_slices(t)
>>> [x for x in ds]
[<tf.Tensor: shape=(2,), dtype=int32, numpy=array([1, 2], dtype=int32)>,
<tf.Tensor: shape=(2,), dtype=int32, numpy=array([3, 4], dtype=int32)>]