Search code examples
pythonnumpytensorflowtensorflow-datasets

tf.data: create a Dataset from a list of Numpy arrays of different shape


I have a list of Numpy arrays of different shape.

I need to create a Dataset, so that each time an element is requested I get a tensor with the shape and values of the given Numpy array.

How can I achieve this?

This is NOT working:

dataset = tf.data.Dataset.from_tensor_slices(list_of_arrays)

since you get, as expected:

Can't convert non-rectangular Python sequence to Tensor.

p.s. I know that it will not be possible to batch a Dataset with elements of different shapes.


Solution

  • Have you tried converting initially to a ragged tensor?

    tensor_with_from_dimensions = tf.ragged.constant([[1, 2], [3], [4, 5, 6]])
    

    Bear in mind that:

    All scalar values in pylist must have the same nesting depth K, and the returned RaggedTensor will have rank K. If pylist contains no scalar values, then K is one greater than the maximum depth of empty lists in pylist. All scalar values in pylist must be compatible with dtype.

    You can read more about it here : https://www.tensorflow.org/api_docs/python/tf/ragged/constant