How to use collate_fn in LibTorch

I'm trying to implement an image based regression using a CNN in libtorch. The problem is, that my images got different sizes, which will cause an Exception batching the images.

First things first, I create my dataset:

auto set = MyDataSet(pathToData).map(torch::data::transforms::Stack<>());

Then I create the dataLoader:

auto dataLoader = torch::data::make_data_loader(
    std::move(set),
    torch::data::DataLoaderOptions().batch_size(batchSize).workers(numWorkersDataLoader)
);

The exception will be thrown batching data in the train loop:

for (torch::data::Example<> &batch: *dataLoader) {
        processBatch(model, optimizer, counter, batch);
}

with a batch size greater than 1 (with a batch size of 1 everything works well because there isn't any stacking involved). For example I'll get the following error using a batch size of 2:

...
what():  stack expects each tensor to be equal size, but got [3, 1264, 532] at entry 0 and [3, 299, 294] at entry 1

I read that one could for example use collate_fn in order to implement some padding (for example here), I just do not get where to implement it. For example torch::data::DataLoaderOptions does not offer such a thing.

Does anyone know how to do this?

Solution

I've got a solution now. In summary, I'm split my CNN in Conv- and Denselayers and use the output of a torch::nn::AdaptiveMaxPool2d in the batch construction.

In order to do so, I have to modify my Dataset, Net and train/val/test-methods. In my Net I added two additional forward-functions. The first one passes data through all Conv-Layers and returns the output of an AdaptiveMaxPool2d-Layer. The second one passes the data through all Dense-Layers. In practice this looks like:

torch::Tensor forwardConLayer(torch::Tensor x) {
    x = torch::relu(conv1(x));
    x = torch::relu(conv2(x));
    x = torch::relu(conv3(x));
    x = torch::relu(ada1(x));
    x = torch::flatten(x);
    return x;
}

torch::Tensor forwardDenseLayer(torch::Tensor x) {
    x = torch::relu(lin1(x));
    x = lin2(x);
    return x;
}

Then I override the get_batch method and use forwardConLayer to compute every batch entry. In order to train (correctly), I call zero_grad() before I construct a batch. All in all this looks like:

std::vector<ExampleType> get_batch(at::ArrayRef<size_t> indices) override {
    // impl from bash.h
    this->net.zero_grad();
    std::vector<ExampleType> batch;
    batch.reserve(indices.size());
    for (const auto i : indices) {
        ExampleType batchEntry = get(i);
        auto batchEntryData = (batchEntry.data).unsqueeze(0);
        auto newBatchEntryData = this->net.forwardConLayer(batchEntryData);             
        batchEntry.data = newBatchEntryData;
        batch.push_back(batchEntry);
    }
    return batch;
}

Lastly I call forwardDenseLayer at all places where I normally would call forward, e.g.:

    for (torch::data::Example<> &batch: *dataLoader) {
        auto data = batch.data;
        auto target = batch.target.squeeze();

        auto output = model.forwardDenseLayer(data);
        auto loss = torch::mse_loss(output, target);
        LOG(INFO) << "Batch loss: " << loss.item<double>();

        loss.backward();
        optimizer.step();
    }

Update

This solution seems to cause an error if the number of the dataloader's workers isn't 0. The error is:

terminate called after thro9wing an instance of 'std::runtime_error'
  what(): one of the variables needed for gradient computation has been modified by an inplace operation: [CPUFloatType [3, 12, 3, 3]] is at version 2; expected version 1 instead. ...

This error does make sense because the data is passing the CNN's head during the batching process. The solution to this "problem" is to set the number of workers to 0.