Understanding input shape (spatial_window_size) for Niftynet

I am using Niftynet for medical image segmentation. I have seen a great tutorial on spatial_window_shape parameter constraints here https://nbviewer.jupyter.org/gist/fepegar/1fb865494cb44ac043c3189ec415d411.

But I wonder how to choose between possible shapes? What is the logic behind them? When to choose bigger or smaller spatial_window_shape size? What is important at setting this parameter for image, for labels and for inference? Why are sizes for label and image different? I am also interested in how the border parameter affect this choice.

Solution

The spatial window_size parameter defines the size of the crop that you want to get from your input images during data augmentation.

What is important at setting this parameter for image, for labels, and for inference?

This parameter should be the same in the [TRAINING], and [INFERENCE] sections because the pipeline uses the spatial_window_size to aggregate the patches into the original resolution. Choosing the initial window size depends on the compatible shapes for your CNN architecture, the dimensionality of your input shape (2D slices v. Voxels), and your memory constraints (too large and it might not fit in your GPU memory).

When to choose bigger or smaller spatial_window_size shapes?

In general, larger patch sizes are preferable (they've been observed to produce slightly better performance), and I'd refer you to this answer for the rationale. However, it depends on your specific dataset, so I'd recommend you experiment with different patch sizes.

However, you can also use a technique called Building Up Sizes (refer to tip #9), where you start training using a smaller spatial_window_size, then increase the size, and train the same model again to reduce overfitting and improve overall performance. Note that this will only work if you use fully convolutional CNNs or CNNs with some form of spatial pyramid pooling (where the input image size won't matter).

Why are sizes for the label and image different?

There needs to be more clarification for this question (i.e. config, original image resolution), but unfortunately, I do not have enough reputation to comment.

The border parameter in the [INFERENCE] section removes the padding from the volume_padding_size parameter in the [TRAINING] section. As per the configuration documention, the border should be at least floor(N-D)/2, where N represents one of the elements of original voxel/slice size and D represents one of the elements of the outputted network voxel/slice size (the spatial_window_size). For 2D window sizes (i.e. 96 X 96 X 1), the border could be (96,96,0), where the last element must be 0. Therefore, the border parameter is reliant on the window size and doesn't affect how we choose it. Rather, we choose the border parameter based on how much we want the network to focus on outside pixels over the center pixels.