Search code examples
pythonnumpytensorflowdataset

Creating a tf.Dataset from an numpy array with shape (890,2048,3)


I am working on the point net implementation for the registration of point clouds. for that I created 890 source and target point clouds stored in NumPy arrays with shape=(2048,3). I then combined all 890 source and target arrays into 2 big arrays with shape=(890,2048,3). Now I want to create an input pipeline for a TensorFlow model. How do I create a Tensorflow dataset from these two numpy arrays and how do I check whether it worked? I tried :

data1 = tf.data.Dataset.from_tensor_slices((source,targ))
data

But I only get:

<TensorSliceDataset element_spec=(TensorSpec(shape=(2048, 3), dtype=tf.float64, name=None), TensorSpec(shape=(2048, 3), dtype=tf.float64, name=None))>'

as an output..

I really appreciate any help or guidance to where to look at:)


Solution

  • This is because you need to batch your data. Otherwise tensorflow retains the original shape with which you created the dataset and sends in batches of 1

    Contrast

    source = np.random.normal(size=(890,2048,3))
    targ = np.random.normal(size=(890,2048,3))
    
    data1 = tf.data.Dataset.from_tensor_slices((source,targ))
    
    for x,y in data1.take(1):
      print(x.shape)
      print(y.shape)
    
    >>>(2048, 3)
    (2048, 3)
    

    with

    source = np.random.normal(size=(890,2048,3))
    targ = np.random.normal(size=(890,2048,3))
    
    data1 = tf.data.Dataset.from_tensor_slices((source,targ))
    data1 = data1.batch(8) #Or some number of convenience
    
    for x,y in data1.take(1):
      print(x.shape)
      print(y.shape)
    
    >>>(8, 2048, 3)
    (8, 2048, 3)