Search code examples
pythontensorflowobject-detection-api

Invalid argument: Input to reshape is a tensor with x values, but requested shape requires a multiple of y. {node Reshape_13}


I am using tensorflow's object detection api with faster_rcnn_resnet101 and get the following error when trying to train:

tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Input to reshape is a tensor with 36 values, but the requested shape requires a multiple of 16

[[{{node Reshape_13}}]]

[[IteratorGetNext]]

[[IteratorGetNext/_7243]]

(1) Invalid argument: Input to reshape is a tensor with 36 values, but the requested shape requires a multiple of 16

[[{{node Reshape_13}}]]

[[IteratorGetNext]]

0 successful operations. 0 derived errors ignored.

I am using a slightly modified version of the pets-train.sh file to run the training (only paths have been altered). I am trying to train on tf.record files containing jpg images of size (1280, 720) and have made no changes to the network architecture (I have confirmed that all images in the record are of this size).

Curiously, I can successfully run inference on these images when I do something equivalent to what's in the tutorial file detect_pets.py. This makes me think something is wrong with the way that I've created the tf.record files (code below) rather than anything to do with the shape of the images, despite the error having to do with reshape. However,I've successfully trained on tf.records created in the same way before (from images of size (600, 600), (1024, 1024), and (720, 480), all with the same network). Moreover, I've previously encountered a similar error (only the numbers were different but the error was still with node Reshape_13) on a different data set of images with size (600, 600).

I am using python 3.7, tf version 1.14.0, cuda 10.2, Ubuntu 18.04

I've looked extensively at various other posts (here, here, here, here, and here) but I wasn't able to make any progress.

I've tried adjusting the keep_aspect_ratio_resizer parameters (originally min_dimension=600, max_dimension=1024 but I've also tried min, max = (720, 1280) and have tried pad_to_max_dimension: true with both of these min/max choices as well).

This is the code I'm using to create the tf.record file (apologies or indentations being off here):

def make_example(imfile, boxes):
with tf.gfile.GFile(imfile, "rb") as fid:
    encoded_jpg = fid.read()

encoded_jpg_io = io.BytesIO(encoded_jpg)
image = PIL.Image.open(encoded_jpg_io)
if image.format != "JPEG":
    raise Exception("Images need to be in JPG format")

height = image.height
width = image.width

xmins = []
xmaxs = []
ymins = []
ymaxs = []
for box in boxes:
    xc, yc, w, h = box
    xmin = xc - w / 2
    xmax = xc + w / 2
    ymin = yc - h / 2
    ymax = yc + h / 2

    new_xmin = np.clip(xmin, 0, width - 1)
    new_xmax = np.clip(xmax, 0, width - 1)
    new_ymin = np.clip(ymin, 0, height - 1)
    new_ymax = np.clip(ymax, 0, height - 1)

    area = (ymax - ymin) * (xmax - xmin)
    new_area = (new_ymax - new_ymin) * (new_xmax - new_xmin)
    if new_area > 0.3 * area:
        xmins.append(new_xmin / width)
        xmaxs.append(new_xmax / width)
        ymins.append(new_ymin / height)
        ymaxs.append(new_ymax / height)

classes_text = ["vehicle".encode("utf8")] * len(boxes)
classes = [1] * len(boxes)
abs_imfile = os.path.abspath(imfile)
difficult = [0] * len(boxes)

example = tf.train.Example(
    features=tf.train.Features(
        feature={
            "image/height": int64_feature(height),
            "image/width": int64_feature(width),
            "image/filename": bytes_feature(imfile.encode("utf8")),
            "image/source_id": bytes_feature(abs_imfile.encode("utf8")),
            "image/encoded": bytes_feature(encoded_jpg),
            "image/format": bytes_feature("jpeg".encode("utf8")),
            "image/object/bbox/xmin": float_list_feature(xmins),
            "image/object/bbox/xmax": float_list_feature(xmaxs),
            "image/object/bbox/ymin": float_list_feature(ymins),
            "image/object/bbox/ymax": float_list_feature(ymaxs),
            "image/object/class/text": bytes_list_feature(classes_text),
            "image/object/class/label": int64_list_feature(classes),
            "image/object/difficult": int64_list_feature(difficult),
        }
    )
)
return example


def make_tfrecord(outfile, imfiles, truthfiles):
writer = tf.python_io.TFRecordWriter(outfile)

for imfile, truthfile in zip(imfiles, truthfiles):
    print(imfile)
    boxes = pd.read_csv(truthfile)
    if boxes.empty:
        boxes = []
    else:
        boxes = [
            (box.Xc, box.Yc, box.Width, box.Height) for box in boxes.itertuples()
        ]

    example = make_example(imfile, boxes)
    writer.write(example.SerializeToString())
writer.close()

def make_combined_train_dset(names):
imfiles = []
truthfiles = []
traindir = os.path.join(tf_datadir, "train")
valdir = os.path.join(tf_datadir, "val")

for name in names:
    imdir = os.path.join(processed_datadir, name, "images")
    truthdir = os.path.join(processed_datadir, name, "truth")

    imfiles.extend(sorted(glob.glob(os.path.join(imdir, "*.jpg"))))
    truthfiles.extend(sorted(glob.glob(os.path.join(truthdir, "*.csv"))))

inds = list(range(len(imfiles)))
np.random.shuffle(inds)
imfiles = [imfiles[i] for i in inds]
truthfiles = [truthfiles[i] for i in inds]

ntrain = round(0.9 * len(imfiles))
train_imfiles = imfiles[:ntrain]
train_truthfiles = truthfiles[:ntrain]
val_imfiles = imfiles[ntrain:]
val_truthfiles = truthfiles[ntrain:]

chunksize = 1500

for d in [traindir, valdir]:
    if not os.path.exists(d):
        os.mkdir(d)

for i in range(0, len(train_imfiles), chunksize):
    print(f"{i} / {len(train_imfiles)}", end="\r")
    cur_imfiles = train_imfiles[i : i + chunksize]
    cur_truthfiles = train_truthfiles[i : i + chunksize]
    testfile = os.path.join(traindir, f"{i}.tfrecord")
    make_tfrecord(testfile, cur_imfiles, cur_truthfiles)

for i in range(0, len(val_imfiles), chunksize):
    print(f"{i} / {len(val_imfiles)}", end="\r")
    cur_imfiles = val_imfiles[i : i + chunksize]
    cur_truthfiles = val_truthfiles[i : i + chunksize]
    testfile = os.path.join(valdir, f"{i}.tfrecord")
    make_tfrecord(testfile, cur_imfiles, cur_truthfiles)

def make_train_dset(name, train_inc=1, val_inc=1, test_inc=1):
trainfile = os.path.join(tf_datadir, name + "-train.tfrecord")
valfile = os.path.join(tf_datadir, name + "-val.tfrecord")

imdir = os.path.join(processed_datadir, name, "images")
truthdir = os.path.join(processed_datadir, name, "truth")

imfiles = sorted(glob.glob(os.path.join(imdir, "*.jpg")))
truthfiles = sorted(glob.glob(os.path.join(truthdir, "*.csv")))

n = len(imfiles)
ntrain = round(0.9 * n)

print(trainfile)
make_tfrecord(trainfile, imfiles[:ntrain:train_inc], truthfiles[:ntrain:train_inc])
print(valfile)
make_tfrecord(valfile, imfiles[ntrain::val_inc], truthfiles[ntrain::val_inc])

With other datasets, I've been able to just create a tf.record using the function make_combined_train_dset (or make_train_dset) defined above and then provide the path to these datasets in the faster_rcnn_resnet101.config file and then training proceeded normally (just as in the tutorial example). With this new dataset (and at least one other dataset), I'm getting the aforementioned reshape error. Nonetheless, I can still run inference on the images in this dataset so it makes me think that the problem is with the tf records or with how they are being read in rather than an intrinsic problem with the images or their size.

Any assistance anyone can offer would be greatly appreciated as I've struggled with this for several days now.


Solution

  • I'm an idiot: confirmed.

    The problem was that classes_text, classes, and difficult were the wrong length.

    Replaced

    classes_text = ["vehicle".encode("utf8")] * len(boxes)
    classes = [1] * len(boxes)
    difficult = [0] * len(boxes)
    

    with

    classes_text = ["vehicle".encode("utf8")] * len(xmins)
    classes = [1] * len(xmins
    difficult = [0] * len(xmins)
    

    and it runs fine. Posting this in case anyone else struggles with a similar issue.

    Thank you to anyone who put time or thought into my question. Hope this helps someone not waste time.