import numpy as np
import scipy.signal as sig
from scipy.fft import fft
from timeit import default_timer as dtime
dtype = 'float32'
n_fft = 598
A = np.random.randn(n_fft, 160000).astype(dtype)
v0 = sig.windows.dpss(n_fft, 4).astype(dtype)
v1 = sig.windows.dpss(n_fft, n_fft // 8).astype(dtype)
v = v1
#%%###############################################################
t0 = dtime()
fft(A)
print(dtime() - t0)
A *= v.reshape(-1, 1)
#%%###############################################################
t0 = dtime()
fft(A)
print(dtime() - t0)
>>> 1.3161122000001342
>>> 4.751361799999813
Equal if using v = v0
or dtype = 'float64'
instead. Why does this happen? (more times)
Note: a workaround is v = v1 + 1
, v -= 1
, but this shouldn't be necessary... filed Issue.
Win 10 x64, numpy 1.18.5, scipy 1.6.1, Python 3.7.9.
This is caused by denormals (extremely small non-zero numbers) which make some CPU instructions run much slower; details. Workaround is to zero them manually, as in +1/-1
, or 'safely' via e.g. ftz (and after type casting):
from ftz import ftz
ftz(v)
A *= v.reshape(-1, 1)
t0 = dtime()
fft(A)
print(dtime() - t0)
>>> 1.4638332999998056
>>> 1.4597183999999288