python python-3.x tensorflow tensorflow2.0 tensorflow-datasets

Loading binary data with FixedLengthRecordDataset in TensorFlow

I'm trying to figure out how to load binary data file using FixedLengthRecordDataset:

import tensorflow as tf
import struct
import numpy as np

RAW_N = 2 + 20*20 + 1

def convert_binary_to_float_array(register):
    return struct.unpack('f'*RAW_N, register.numpy())

raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
float_ds = raw_dataset.map(map_func=convert_binary_to_float_array)

This code throws:

AttributeError: in user code:

tf-load-data.py:14 convert_binary_to_float_array  *
    return struct.unpack('f'*RAW_N, register.numpy())

AttributeError: 'Tensor' object has no attribute 'numpy'

numpy() is available if I try to iterate over the dataset:

raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)

for register in raw_dataset:
    print(struct.unpack('f'*RAW_N, register.numpy()))

By reading the Tensor type description, I realized that numpy() is available only during eager execution. Thus, I can deduce that during the map() call the elements are not provided as EagerTensor.

How to load this data into a dataset?

I'm using TensorFlow 2.4.1

Solution

I would suggest working with tf.io.decode_raw. I unfortunately do not know what mydata.bin looks like so I created some dummy data:

import random
import struct
import tensorflow as tf
import numpy as np

RAW_N = 2 + 20*20 + 1

bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
  f.write(struct.pack('1612i', *bytess))

def convert_binary_to_float_array(register):
    return tf.io.decode_raw(register, out_type=tf.float32)

raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin'], record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(convert_binary_to_float_array)

for register in raw_dataset:
  print(register)

You could also try first decoding your data into integers with tf.io.decode_raw and then casting to float with tf.cast, but I am not sure if it will make a difference.