Search code examples
pythontensorflowtfrecord

tensorflow create TFrecord from numpy opencv weird size


I am creating a TFrecord formatted file for object detection using my own dataset. There are two methods to that do that when we have the path of our image:


    with tf.gfile.GFile(path+filename, 'rb') as fid:
            encoded_image_data = fid.read()
    feat = {
          'image/height': dataset_util.int64_feature(height),
          'image/width': dataset_util.int64_feature(width),
          'image/filename': dataset_util.bytes_feature(filename),
          'image/source_id': dataset_util.bytes_feature(filename),
          'image/encoded': dataset_util.bytes_feature(encoded_image_data),
          'image/format': dataset_util.bytes_feature(image_format),
          'image/object/bbox/xmin': dataset_util.float_list_feature(xmins),
          'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs),
          'image/object/bbox/ymin': dataset_util.float_list_feature(ymins),
          'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs),
          'image/object/class/text': dataset_util.bytes_list_feature(classes_text),
          'image/object/class/label': dataset_util.int64_list_feature(classes)
          }
    tf_example = tf.train.Example(features=tf.train.Features(feature=feat))

Another method is to use opencv instead of GFile :

img = cv2.imread(path+filename)
img = img.astype(np.uint8)
img_encoded = img.tostring()

But there's something I don't understand. When I created the TFrecord with the Gfile method, I get a file of 10 mb (which is the size of my dataset in .jpg format). When I use the opencv method, I get a 93 mb file.

Why do I get such a big difference? How can I reduce the size with the opencv format?

P.S : I need the opencv format because I want to concatenate images to get 4 channels instead of 3


Solution

  • The "first method" puts the raw jpeg data in the tfrecord and only later will actually decode the image into an array of pixels. This has the benefit that your tfrecords are smaller because the data is effectively still jpeg-encoded.

    The "opencv" method decodes the jpeg image when you imread, so you're putting in the tfrecod the decoded (and thus heavy) image as a pixel array.

    IMO, you're better off writing the images as jpegs in the tfrecord and do whatever concatenation you need to do in Tensorflow (either via TF operations or py_funcs).