tensorflow machine-learning neural-network computer-vision

Feed a complex-valued image into Neural network

I'm trying to "learn" a relationship between a set of around 10 k complex-valued input images (amplitude/phase; real/imag) and a real-valued output-vector with 48 entries. This output-vector is not a set of labels, but a set of numbers which represents the best parameters to optimize the visual impression of the given complex-valued image. These parameters are generated by an algorithm. It's possible, that there is some noise in the data (coming from images and from the algorithm which generates the parameter-vector)

Those parameters more-less depends on the FFT (fast-fourier-transform) of the input image. Therfore I was thinking of feeding the network (5 hidden-layers, but architecture shouldn't matter right now) with a 1D-reshaped version of the FFT(complexImage) - some pseudocode:

     // discretize spectrum
     obj_ft = fftshift(fft2(object));
     
     obj_real_2d = real(obj_ft);
     obj_imag_2d = imag(obj_ft);
     
     // convert 2D in 1D rows
     obj_real_1d = reshape(obj_real_2d, 1, []);
     obj_imag_1d = reshape(obj_imag_2d, 1, []);
     
     
     // create complex variable for 1d object and concat
     obj_complx_1d(index, :) = [obj_real_1d obj_imag_1d];
     
     opt_param_1D(index, :) = get_opt_param(object);

I was wondering if there is a better approach for feeding complex-valued images into a deep-network. I'd like to avoid the use of complex gradients, because it's not really necessary?! I "just" try to find a "black-box" which outputs the optimized parameters after inserting a new image.

Tensorflow gets the input: obj_complx_1d and output-vector opt_param_1D for training.

Solution

There are several ways you can treat complex signals as input.

Use a transform to make them into 'images'. Short Time Fourier Transforms are used to make spectrograms which are 2D. The x-axis being time, y-axis being frequency. If you have complex input data, you may choose to simply look at the magnitude spectrum, or the power spectral density of your transformed data.

Something else that I've seen in practice is to treat the in-phase and quadrature (real/imaginary) channels separate in early layers of the network, and operate across both in higher layers. In the early layers, your network will learn characteristics of each channel, in higher layers it will learn the relationship between the I/Q channels.

These guys do a lot with complex signals and neural nets. In particular check out 'Convolutional Radio Modulation Recognition Networks'

https://radioml.com/research/