Search code examples
pythonnumpytensorflowdeep-learningtensorflow-datasets

Extract only a portion of a numpy array from tf.data


I have a NumPy array of shape 500,36,24,72. Now I want to create a data pipeline for a problem using tf.data. For every iteration, only a subset of the array is required, for example, first the model is trained over [500,x:y,24,72], wherein only a subset of the second dimension is taken.

ds1 = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(data))

Applying a filter over the above dataset doesn't seem to work

ds2 = ds1.filter(lambda x: x[1:3][:][:])

Solution

  • Use tf.data.Dataset.map:

    import numpy as np
    import tensorflow as tf
    
    data = np.random.random((500,36,24,72))
    ds1 = tf.data.Dataset.zip((tf.data.Dataset.from_tensor_slices(data)))
    ds2 = ds1.map(lambda x: x[1:3, ...])