python tensorflow tensorflow2.0 object-detection

error while using tf.sparse.to_dense function

I'm trying to parse my tfrecord dataset to use it in object detection. when I'm trying to change my sparse tensor to dense tensor I get following error which I can't understand it :


ValueError: Shapes must be equal rank, but are 1 and 0
    From merging shape 3 with other shapes. for '{{node stack}} = Pack[N=5, T=DT_FLOAT, axis=1](SparseToDense, SparseToDense_1, SparseToDense_2, SparseToDense_3, Cast)' with input shapes: [?], [?], [?], [?], [].

my feature_description is :

feature_description = {
    'image/filename': tf.io.FixedLenFeature([], tf.string),
    'image/encoded': tf.io.FixedLenFeature([], tf.string),
    'image/object/bbox/xmin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymin': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/xmax': tf.io.VarLenFeature(tf.float32),
    'image/object/bbox/ymax': tf.io.VarLenFeature(tf.float32),
    'image/object/class/label': tf.io.VarLenFeature(tf.int64),
}

my code for parsing :

def _parse_image_function(example_proto):
  # Parse the input tf.Example proto using the dictionary above.
  return tf.io.parse_single_example(example_proto, feature_description)

def _parse_tfrecord(x):  
    x_train = tf.image.decode_jpeg(x['image/encoded'], channels=3)
    x_train = tf.image.resize(x_train, (416, 416))    
    labels = tf.cast(1, tf.float32)
#    print(type(x['image/object/bbox/xmin']))
    tf.print(x['image/object/bbox/xmin'])

    y_train = tf.stack([tf.sparse.to_dense(x['image/object/bbox/xmin']),
                        tf.sparse.to_dense(x['image/object/bbox/ymin']),
                        tf.sparse.to_dense(x['image/object/bbox/xmax']),
                        tf.sparse.to_dense(x['image/object/bbox/ymax']),
                        labels], axis=1)

    paddings = [[0, 100 - tf.shape(y_train)[0]], [0, 0]]
    y_train = tf.pad(y_train, paddings)
    return x_train, y_train


def load_tfrecord_dataset(train_record_file, size=416):

    dataset=tf.data.TFRecordDataset(train_record_file)
    parsed_dataset = dataset.map(_parse_image_function)
    final = parsed_dataset.map(_parse_tfrecord)
    return final


load_tfrecord_dataset(train_record_file,416)

I used a for loop to see if something is wrong with my data and tf.sparse.to_dense did its job perfectly with for loop, but when I use the .map(_parse_tfrecord) it gives me the error that I wrote above.

result of printing x['image/object/bbox/xmin'] in _parse_tfrecord(x) :

SparseTensor(indices=Tensor("DeserializeSparse_1:0", shape=(None, 1), dtype=int64), values=Tensor("DeserializeSparse_1:1", shape=(None,), dtype=float32)

result of printing x['image/object/bbox/xmin'] in for loop:

SparseTensor(indices=[[0]
 [1]
 [2]
 ...
 [4]
 [5]
 [6]], values=[0.115384616 0.432692319 0.75 ... 0.581730783 0.0817307681 0.276442319], shape=[7])

my for loop :

for x in parsed_dataset:
    tf.print(x['image/object/bbox/xmin'])
    break

What is my mistake here?

Solution

The problem is that labels has a shape (), that is, zero dimensions (it is a scalar), while all the sparse tensors you are trying to stack are one-dimensional. You should make a label tensor that has the same shape as the box data tensors:

# Assuming all box data tensors have the same shape
box_data_shape = tf.shape(x['image/object/bbox/xmin'])
# Make label data
labels = tf.ones(box_data_shape, dtype=tf.float32)

In addition to that, since you are parsing individual examples all your sparse tensors should be one-dimensional and contiguous, so you could save the conversion to dense and just take their .values:

y_train = tf.stack([x['image/object/bbox/xmin'].values,
                    x['image/object/bbox/ymin'].values,
                    x['image/object/bbox/xmax'].values,
                    x['image/object/bbox/ymax'].values,
                    labels], axis=1)