c++image-processing fft convolution kissfft

Gaussian Blur with FFT Questions

I have a current implementation of Gaussian Blur using regular convolution. It is efficient enough for small kernels, but once the kernels size gets a little bigger, the performance takes a hit. So, I am thinking to implement the convolution using FFT. I've never had any experience with FFT related image processing so I have a few questions.

Is a 2D FFT based convolution also separable into two 1D convolutions ?
- If true, does it go like this - 1D FFT on every row, and then 1D FFT on every column, then multiply with the 2D kernel and then inverse transform of every column and the inverse transform of every row? Or do I have to multiply with a 1D kernel after each 1D FFT Transform?
Now I understand that the kernel size should be the same size as the image (row in case of 1D). But how will it affect the edges? Do I have to pad the image edges with zeros? If so the kernel size should be equal to the image size before or after padding?

Also, this is a C++ project, and I plan on using kissFFT, since this is a commercial project. You are welcome to suggest any better alternatives. Thank you.

EDIT: Thanks for the responses, but I have a few more questions.

I see that the imaginary part of the input image will be all zeros. But will the output imaginary part will also be zeros? Do I have to multiply the Gaussian kernel to both real and imaginary parts?
I have instances of the same image to be blurred at different scales, i.e. the same image is scaled to different sizes and blurred at different kernel sizes. Do I have to perform a FFT every time I scale the image or can I use the same FFT?
Lastly, If I wanted to visualize the FFT, I understand that a log filter has to be applied to the FFT. But I am really lost on which part should be used to visualize FFT? The real part or the imaginary part.
Also for an image of size 512x512, what will be the size of real and imaginary parts. Will they be the same length?

Thank you again for your detailed replies.

Solution

The 2-D FFT is seperable and you are correct in how to perform it except that you must multiply by the 2-D FFT of the 2D kernel. If you are using kissfft, an easier way to perform the 2-D FFT is to just use kiss_fftnd in the tools directory of the kissfft package. This will do multi-dimensional FFTs.
The kernel size does not have to be any particular size. If the kernel is smaller than the image, you just need to zero-pad up to the image size before performing the 2-D FFT. You should also zero pad the image edges since the convoulution being performed by multiplication in the frequency domain is actually circular convolution and results wrap around at the edges.

So to summarize (given that the image size is M x N):

come up with a 2-D kernel of any size (U x V)
zero-pad the kernel up to (M+U-1) x (N+V-1)
take the 2-D fft of the kernel
zero-pad the image up to (M+U-1) x (N+V-1)
take the 2-D FFT of the image
multiply FFT of kernel by FFT of image
take inverse 2-D FFT of result
trim off garbage at edges

If you are performing the same filter multiple times on different images, you don't have to perform 1-3 every time.

Note: The kernel size will have to be rather large for this to be faster than direct computation of convolution. Also, did you implement your direct convolution taking advantage of the fact that a 2-D gaussian filter is separable (see this a few paragraphs into the "Mechanics" section)? That is, you can perform the 2-D convolution as 1-D convolutions on the rows and then the columns. I have found this to be faster than most FFT-based approaches unless the kernels are quite large.

Response to Edit

If the input is real, the output will still be complex except for rare circumstances. The FFT of your gaussian kernel will also be complex, so the multiply must be a complex multiplication. When you perform the inverse FFT, the output should be real since your input image and kernel are real. The output will be returned in a complex array, but the imaginary components should be zero or very small (floating point error) and can be discarded.
If you are using the same image, you can reuse the image FFT, but you will need to zero pad based on your biggest kernel size. You will have to compute the FFTs of all of the different kernels.
For visualization, the magnitude of the complex output should be used. The log scale just helps to visualize smaller components of the output when larger components would drown them out in a linear scale. The Decibel scale is often used and is given by either 20*log10(abs(x)) or 10*log10(x*x') which are equivalent. (x is the complex fft output and x' is the complex conjugate of x).
The input and output of the FFT will be the same size. Also the real and imaginary parts will be the same size since one real and one imaginary value form a single sample.