I'm trying to figure out how to load binary data file using FixedLengthRecordDataset
:
import tensorflow as tf
import struct
import numpy as np
RAW_N = 2 + 20*20 + 1
def convert_binary_to_float_array(register):
return struct.unpack('f'*RAW_N, register.numpy())
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
float_ds = raw_dataset.map(map_func=convert_binary_to_float_array)
This code throws:
AttributeError: in user code:
tf-load-data.py:14 convert_binary_to_float_array *
return struct.unpack('f'*RAW_N, register.numpy())
AttributeError: 'Tensor' object has no attribute 'numpy'
numpy()
is available if I try to iterate over the dataset:
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['mydata.bin'],record_bytes=RAW_N*4)
for register in raw_dataset:
print(struct.unpack('f'*RAW_N, register.numpy()))
By reading the Tensor type description, I realized that numpy()
is available only during eager execution. Thus, I can deduce that during the map()
call the elements are not provided as EagerTensor
.
How to load this data into a dataset?
I'm using TensorFlow 2.4.1
I would suggest working with tf.io.decode_raw. I unfortunately do not know what mydata.bin
looks like so I created some dummy data:
import random
import struct
import tensorflow as tf
import numpy as np
RAW_N = 2 + 20*20 + 1
bytess = random.sample(range(1, 5000), RAW_N*4)
with open('mydata.bin', 'wb') as f:
f.write(struct.pack('1612i', *bytess))
def convert_binary_to_float_array(register):
return tf.io.decode_raw(register, out_type=tf.float32)
raw_dataset = tf.data.FixedLengthRecordDataset(filenames=['/content/mydata.bin'], record_bytes=RAW_N*4)
raw_dataset = raw_dataset.map(convert_binary_to_float_array)
for register in raw_dataset:
print(register)
You could also try first decoding your data into integers with tf.io.decode_raw
and then casting to float with tf.cast
, but I am not sure if it will make a difference.