I was helped before in this answer to realise an in-place transform and it works well but ONLY if I start with real data. If I start with complex data, the results after IFT+FFT are wrong, and this happens only in the in-place version, I have perfect results with an out-of-place version of this transform. This is the code:
#include <stdio.h>
#include <stdlib.h>
#include <cuda_runtime.h>
#include <complex.h>
#include <cuComplex.h>
#include <cufft.h>
#include <cufftXt.h>
#define N 4
#define N_PAD ( 2*(N/2+1) )
void print_3D_Real(double *array){
printf("\nPrinting 3D real matrix \n");
unsigned long int idx;
for (int z = 0; z < N; z++){
printf("---------------------------------------------------------------------------- plane %d below\n", z);
for (int x = 0; x < N; x++){
for (int y = 0; y < N; y++){
idx = z + N_PAD * (y + x * N);
printf("%.3f \t", array[idx]);
}
printf("\n");
}
}
}
void print_3D_Comp(cuDoubleComplex *array){
printf("\nPrinting 3D complex matrix \n");
unsigned long int idx;
for (int z = 0; z < (N/2+1); z++){
printf("---------------------------------------------------------------------------- plane %d below\n", z);
for (int x = 0; x < N; x++){
for (int y = 0; y < N; y++){
idx = z + (N/2+1) * (y + x * N);
printf("%+.3f%+.3fi \t", array[idx].x, array[idx].y);
}
printf("\n");
}
}
}
// Main function
int main(int argc, char **argv){
CU_ERR_CHECK( cudaSetDevice(0) );
unsigned long int idx, in_mem_size, out_mem_size;
cuDoubleComplex *in = NULL, *d_in = NULL;
double *out = NULL, *d_out = NULL;
cufftHandle plan_r2c, plan_c2r;
in_mem_size = sizeof(cuDoubleComplex) * N*N*(N/2+1);
out_mem_size = in_mem_size;
in = (cuDoubleComplex *) malloc (in_mem_size);
out = (double *) in;
cudaMalloc((void **)&d_in, in_mem_size);
d_out = (double *) d_in;
cufftPlan3d(&plan_c2r, N, N, N, CUFFT_Z2D);
cufftPlan3d(&plan_r2c, N, N, N, CUFFT_D2Z);
memset(in, 0, in_mem_size);
memset(out, 0, out_mem_size);
// Initial complex data
for (int x = 0; x < N; x++){
for (int y = 0; y < N; y++){
for (int z = 0; z < (N/2+1); z++){
idx = z + (N/2+1) * (y + x * N);
in[idx].x = idx;
}
}
}
print_3D_Comp(in);
cudaMemcpy(d_in, in, in_mem_size, cudaMemcpyHostToDevice);
cufftExecZ2D(plan_c2r, (cufftDoubleComplex *)d_in, (cufftDoubleReal *)d_out);
cudaMemcpy(out, d_out, out_mem_size, cudaMemcpyDeviceToHost);
// Normalisation
for (int i = 0; i < N*N*N_PAD; i++)
out[i] /= (N*N*N);
print_3D_Real(out);
cudaMemcpy(d_out, out, out_mem_size, cudaMemcpyHostToDevice);
cufftExecD2Z(plan_r2c, (cufftDoubleReal *)d_out, (cufftDoubleComplex *)d_in);
cudaMemcpy(in, d_in, in_mem_size, cudaMemcpyDeviceToHost) );
print_3D_Comp(in);
cudaDeviceReset();
return 0;
}
The output of my program is on this pastebin.
Can someone direct me on the right path? Thank you very much in advance.
First of all, your code doesn't compile.
In its most general definition, the fourier transform performs a mapping from one complex domain to another complex domain, and this operation should be reversible.
However, the C2R and R2C are special cases, with an assumption that the signal is completely representable in one of the 2 domains (the "time" domain) as a purely real signal (all imaginary components are zero).
However, it should be evident that there will be some complex "frequency" domain representations that cannot be represented by a purely real time domain signal. If the counter case were true (any complex frequency domain signal can be represented as a purely real time domain signal) then the FFT could not be reversible for a complex time domain signal (since all frequency domain data sets map to purely real time domain data sets.)
Therefore you cannot choose arbitrary data in the frequency domain, and expect it to map correctly into a purely real time domain signal. (*)
As a demonstration, change your input data set to the following:
in[idx].x = (idx)?0:1;
and I believe you will get a "passing" test case.
Furthermore, your allegation that "I have perfect results with an out-of-place version of this transform" I believe cannot be supported, if you are in fact using this particular data set as posted in your question. If you disagree, please post a complete code demonstrating your passing test case with the out-of-place transform, that is otherwise identical to your posted code.
Finally, we can test this with fftw. A conversion of your program to use fftw instead of cufft produces exactly the same output:
$ cat t355.cpp
#include <stdio.h>
#include <stdlib.h>
#include <fftw3.h>
#include <string.h>
#define N 4
#define N_PAD ( 2*(N/2+1) )
void print_3D_Real(double *array){
printf("\nPrinting 3D real matrix \n");
unsigned long int idx;
for (int z = 0; z < N; z++){
printf("---------------------------------------------------------------------------- plane %d below\n", z);
for (int x = 0; x < N; x++){
for (int y = 0; y < N; y++){
idx = z + N_PAD * (y + x * N);
printf("%.3f \t", array[idx]);
}
printf("\n");
}
}
}
void print_3D_Comp(fftw_complex *array){
printf("\nPrinting 3D complex matrix \n");
unsigned long int idx;
for (int z = 0; z < (N/2+1); z++){
printf("---------------------------------------------------------------------------- plane %d below\n", z);
for (int x = 0; x < N; x++){
for (int y = 0; y < N; y++){
idx = z + (N/2+1) * (y + x * N);
printf("%+.3f%+.3fi \t", array[idx][0], array[idx][1]);
}
printf("\n");
}
}
}
// Main function
int main(int argc, char **argv){
unsigned long int idx, in_mem_size, out_mem_size;
fftw_complex *in = NULL;
double *out = NULL;
in_mem_size = sizeof(fftw_complex) * N*N*(N/2+1);
out_mem_size = in_mem_size;
in = (fftw_complex *) malloc (in_mem_size);
out = (double *) in;
memset(in, 0, in_mem_size);
memset(out, 0, out_mem_size);
// Initial complex data
for (int x = 0; x < N; x++){
for (int y = 0; y < N; y++){
for (int z = 0; z < (N/2+1); z++){
idx = z + (N/2+1) * (y + x * N);
in[idx][0] = idx;
}
}
}
print_3D_Comp(in);
fftw_plan plan_c2r = fftw_plan_dft_c2r_3d(N, N, N, in, out, FFTW_ESTIMATE);
fftw_plan plan_r2c = fftw_plan_dft_r2c_3d(N, N, N, out, in, FFTW_ESTIMATE);
fftw_execute(plan_c2r);
// Normalisation
for (int i = 0; i < N*N*N_PAD; i++)
out[i] /= (N*N*N);
print_3D_Real(out);
fftw_execute(plan_r2c);
print_3D_Comp(in);
return 0;
}
$ g++ t355.cpp -o t355 -lfftw3
$ ./t355
Printing 3D complex matrix
---------------------------------------------------------------------------- plane 0 below
+0.000+0.000i +3.000+0.000i +6.000+0.000i +9.000+0.000i
+12.000+0.000i +15.000+0.000i +18.000+0.000i +21.000+0.000i
+24.000+0.000i +27.000+0.000i +30.000+0.000i +33.000+0.000i
+36.000+0.000i +39.000+0.000i +42.000+0.000i +45.000+0.000i
---------------------------------------------------------------------------- plane 1 below
+1.000+0.000i +4.000+0.000i +7.000+0.000i +10.000+0.000i
+13.000+0.000i +16.000+0.000i +19.000+0.000i +22.000+0.000i
+25.000+0.000i +28.000+0.000i +31.000+0.000i +34.000+0.000i
+37.000+0.000i +40.000+0.000i +43.000+0.000i +46.000+0.000i
---------------------------------------------------------------------------- plane 2 below
+2.000+0.000i +5.000+0.000i +8.000+0.000i +11.000+0.000i
+14.000+0.000i +17.000+0.000i +20.000+0.000i +23.000+0.000i
+26.000+0.000i +29.000+0.000i +32.000+0.000i +35.000+0.000i
+38.000+0.000i +41.000+0.000i +44.000+0.000i +47.000+0.000i
Printing 3D real matrix
---------------------------------------------------------------------------- plane 0 below
23.500 -1.500 -1.500 -1.500
-6.000 0.000 0.000 0.000
-6.000 0.000 0.000 0.000
-6.000 0.000 0.000 0.000
---------------------------------------------------------------------------- plane 1 below
-0.500 0.750 0.000 -0.750
3.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
-3.000 0.000 0.000 0.000
---------------------------------------------------------------------------- plane 2 below
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
---------------------------------------------------------------------------- plane 3 below
-0.500 -0.750 0.000 0.750
-3.000 0.000 0.000 0.000
0.000 0.000 0.000 0.000
3.000 0.000 0.000 0.000
Printing 3D complex matrix
---------------------------------------------------------------------------- plane 0 below
+0.000+0.000i +6.000+0.000i +6.000+0.000i +6.000+0.000i
+24.000+0.000i +30.000+0.000i +30.000+0.000i +30.000+0.000i
+24.000+0.000i +30.000+0.000i +30.000+0.000i +30.000+0.000i
+24.000+0.000i +30.000+0.000i +30.000+0.000i +30.000+0.000i
---------------------------------------------------------------------------- plane 1 below
+1.000+0.000i +4.000+0.000i +7.000+0.000i +10.000+0.000i
+13.000+0.000i +16.000+0.000i +19.000+0.000i +22.000+0.000i
+25.000+0.000i +28.000+0.000i +31.000+0.000i +34.000+0.000i
+37.000+0.000i +40.000+0.000i +43.000+0.000i +46.000+0.000i
---------------------------------------------------------------------------- plane 2 below
+2.000+0.000i +8.000+0.000i +8.000+0.000i +8.000+0.000i
+26.000+0.000i +32.000+0.000i +32.000+0.000i +32.000+0.000i
+26.000+0.000i +32.000+0.000i +32.000+0.000i +32.000+0.000i
+26.000+0.000i +32.000+0.000i +32.000+0.000i +32.000+0.000i
$
(*) You can argue, if you wish, that that the complex-conjugate symmetry feature of the C2R and R2C transforms should account for a correct mapping of all possible complex "frequency" domain signals into unique, purely real "time" domain signals. I claim, without proof, that it does not, with 2 data points:
(2*(N/2+1))/N
), it stands to reason that there cannot be a unique 1:1 mapping of all possible complex signals into unique real signals. And the unique 1:1 mapping would be necessary for full reversibility.For additional background on the possibility of lack of symmetry in random data, note the discussion around CUFFT_COMPATIBILITY_FFTW_ASYMMETRIC
in the cufft documentation.