Search code examples
image-processingdeep-learningneural-networkgaussian-process

Multi-scale Convolutional Neural Network for Image deblurring


I recently reproduce a deep learing model for imaging deblurring, which title is Deep Multi-scale Convolutional Neural Network for Dynamic Scene Deblurring,publised in 2017. The paper mensioned that To exploit coarse and middle level information while preserving fine level information at the same time, input and output to the network take the form of Gaussian pyramids.

As shown in the image,the model architecture is composed of three levels. I wonder that why the network is set to three layers other than other number,which doesn't seem to be explained in detail in the paper. I search the introduction of Gaussian pyramids and found that the number of level seem to be not fixed. I also search the the influence of the number of layers in the Gaussian pyramid and found that there are not related posts, so I also curious about the influence of the number of layers in the Gaussian pyramid and how to set the number of layers reasonably. enter image description here

I would appreciate it if you could answer my question or suggest me some learning links related to the problem.


Solution

  • From the information provided in the original paper you are referencing, the authors explain there are 3 levels of the gaussian pyramid represented by the variable "K"

    We define scale levels in the order of decreasing resolution (i.e. level 1 for finest scale). Unless denoted otherwise, we use total K = 3 scales.

    Next, the authors explain that they use a 3 scale gaussian pyramid to "exploit coarse and middle level information while preserving fine level information at the same time" Note that the number of levels and the scales are the same.

    In our model, finer scale image deblurring is aided by coarser scale features. To exploit coarse and middle level information while preserving fine level information at the same time, input and output to our network take the form of Gaussian pyramids.

    Lastly, the authors end that paragraph with the following statement that in contrast to other approaches that use a single image, their approach as described in the paper utilizes 3 images.

    Note that most of other coarse-to-fine networks take a single image as input and output.

    In a single image approach that the authors are contrasting against, you would not need the Gaussian pyramid. as you would pass a "single image as input and output" according to the authors.

    To answer your first question:

    I wonder that why the network is set to three layers other than other number,which doesn't seem to be explained in detail in the paper.

    Based on the information above, the authors indicate that their approach to use K=3 in contrast to other coarse-to-fine networks which are K=1. They don't explain their reasoning why they chose 3 instead of 4 or 5 explicitly, but as I mentioned, they talk about the use of the K=3 pyramid to exploit the coarse, middle, and fine level information. That seems to indicate that there should be at minimum K=3 to make use of the information available at the 3 levels.

    As for your second question:

    I also search the the influence of the number of layers in the Gaussian pyramid and found that there are not related posts, so I also curious about the influence of the number of layers in the Gaussian pyramid and how to set the number of layers reasonably.

    This question is asking an opinion of the person answering--which is not how StackOverflow works (see here for opinions vs facts policy). I cannot tell you how to set the number reasonably. I can tell you that you might want to consider the following to help you decide:

    1. Why would increasing the layers of the Gaussian pyramid affect the performance?
    2. What extra information would be gained by adding additional layers to the Gaussian pyramid that would be useful?

    If you can answer those questions, then that should help you decide if it is worth pursuing K=4, K=5, etc.