Search code examples
pythonpython-3.xtensorflowtensorflow2.0tensorflow-datasets

Loading binary data with FixedLengthRecordDataset in TensorFlow


I'm trying to figure out how to load binary data file using FixedLengthRecordDataset:

import tensorflow as tf
import struct
import numpy as np

RAW_N = 2 + 20*20 + 1

def convert_binary_to_float_array(register):
    return struct.unpack('f'*RAW_N, register.numpy())

raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
float_ds = raw_dataset.map(map_func=convert_binary_to_float_array)

This code throws:

AttributeError: in user code:

tf-load-data.py:14 convert_binary_to_float_array  *
    return struct.unpack('f'*RAW_N, register.numpy())

AttributeError: 'Tensor' object has no attribute 'numpy'

numpy() is available if I try to iterate over the dataset:

raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)

for register in raw_dataset:
    print(struct.unpack('f'*RAW_N, register.numpy()))

By reading the Tensor type description, I realized that numpy() is available only during eager execution. Thus, I can deduce that during the map() call the elements are not provided as EagerTensor.

How to load this data into a dataset?

I'm using TensorFlow 2.4.1


Solution

  • I would suggest working with tf.io.decode_raw. I unfortunately do not know what mydata.bin looks like so I created some dummy data:

    import random
    import struct
    import tensorflow as tf
    import numpy as np
    
    RAW_N = 2 + 20*20 + 1
    
    bytess = random.sample(range(1, 5000), RAW_N*4)
    with open('mydata.bin', 'wb') as f:
      f.write(struct.pack('1612i', *bytess))
    
    def convert_binary_to_float_array(register):
        return tf.io.decode_raw(register, out_type=tf.float32)
    
    raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin'], record_bytes=RAW_N*4)
    raw_dataset = raw_dataset.map(convert_binary_to_float_array)
    
    for register in raw_dataset:
      print(register)
    

    You could also try first decoding your data into integers with tf.io.decode_raw and then casting to float with tf.cast, but I am not sure if it will make a difference.