How can I use FFTW with the Eigen library?

I'm trying to learn how to use the FFTW library with Eigen. I don't want to use Eigen's unsupported module since I'd eventually like to incorporate FFTW's wisdom features into my code. However, I'm struggling on implementing a very basic example. Here is my code so far:

void fft(Eigen::Ref<Eigen::VectorXcd> inVec, int N) {

    fftw_complex *in, *out;
    fftw_plan p;

    in = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    out = (fftw_complex*) fftw_malloc(sizeof(fftw_complex) * N);
    p = fftw_plan_dft_1d(N, in, in, FFTW_FORWARD, FFTW_ESTIMATE);

    in = reinterpret_cast<fftw_complex*>(inVec.data());

    fftw_execute(p);

    fftw_destroy_plan(p);

    // reassign input back to inVec here

    fftw_free(in); fftw_free(out);
}

This is essentially copied from chapter 2.1 in fftw's documentation. The documentation seems to say that its basic interface can't operate on different datasets, but you also have to initialize the data after creating a plan. I don't understand this point. Why not simply overwrite the data you initialize the first time with new data and execute the plan again?

Also, I've tried to cast an Eigen vector to fftw_complex, but I suspect I've made a mistake here. In Eigen's unsupported FFTW module, there is also a const_cast call. Why would that be there, and is that necessary since my underlying data is not const here?

Finally, if in is a pointer to fftw_complex data, how can I reassign it to my inVec and then free in?

Solution

I don't understand this point. Why not simply overwrite the data you initialize the first time with new data and execute the plan again?

Yes, that is exactly what fftw wants you to do. The line in = reinterpret_cast<fftw_complex*>(inVec.data()); just sets a pointer. It doesn't copy the array. You need to memcpy the content over, meaning memcpy(in, invec.data(), N * sizeof(fftw_complex));

What you want (and that is somewhat hidden in the FFTW documentation), is the "new-array execute function" that allows you to specify a different array for the call. However, note this line in the documentation:

The alignment issue is especially critical, because if you don’t use fftw_malloc then you may have little control over the alignment of arrays in memory. For example, neither the C++ new function nor the Fortran allocate statement provide strong enough guarantees about data alignment. If you don’t use fftw_malloc, therefore, you probably have to use FFTW_UNALIGNED (which disables most SIMD support). If possible, it is probably better for you to simply create multiple plans (creating a new plan is quick once one exists for a given size), or better yet re-use the same array for your transforms.

When you use Eigen, that is probably not an issue because Eigen also aligns its arrays. GCC also uses 16 byte alignment, so you are probably fine, even when calling malloc, but that depends on your platform. Your function interface doesn't guarantee alignment, though, because an Eigen::Ref may be a segment of a larger array. But you can check at runtime. Something like this:

unsigned flags = FFTW_ESTIMATE;
if(reinterpret_cast<size_t>(invec.data()) % 16)
    flags |=  FFTW_UNALIGNED;
p = fftw_plan_dft_1d(..., flags);

The warning about loss of vectorization may be outdated. Proper alignment is less crucial today (especially on anything supporting AVX) and I suspect (but have not verified) that FFTW will vectorize with unaligned memory access, too.

Side note: In the line p = fftw_plan_dft_1d(N, in, in, FFTW_FORWARD, FFTW_ESTIMATE);, the third argument should be out, not in; unless you want an in-place transformation, in which case you don't need to allocate the out array.

EDIT: The check for alignment may break. I've just checked the source code and FFTW may define different alignments depending on compile flags. I don't see a good way to figure out what alignment it uses. It may be 32 byte for AVX or 64 byte for AVX-512.

On the other hand, the fftw_alignment_of function that is provided to check for this kind of stuff does a simple modulo 16 like my code above. So I don't know. Breakage should be very obvious. It will just crash, not result in invalid results.