Single precision rfft

I seek single precision rfft to accelerate computation; scipy.fftpack.rfft does this, but returns a real array that packs real and imaginary components in same axis, requiring a post-processing step. I implemented below to obtain the standard complex array, but Numpy's rfft ends up being faster for 2D inputs (but slower for 1D). Memory is also of concern, OOM with float64.

Does scipy or another library have a single precision rfft implementation that returns the standard complex array? (else, can below be done faster?)

import numpy as np
from numpy.fft import rfft
from scipy.fftpack import rfft as srfft

def rfft_sp(x):  # assumes len(x) is even
    xf = np.zeros((len(x)//2 + 1, x.shape[1]), dtype='complex64')
    h = srfft(x, axis=0)            
    xf[0] = h[0]
    xf[1:] = h[1::2]
    xf[:1].imag = 0
    xf[-1:].imag = 0
    xf[1:-1].imag = h[2::2]
    return xf

x = np.random.randn(500, 100000).astype('float32')

%timeit rfft_sp(x)
%timeit rfft(x, axis=0)

>>> 565 ms ± 15.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> 517 ms ± 22.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Solution

scipy.fft works with single precision.