julia wolfram-mathematica numerical-integration

Julia vs Mathematica: Numerical integration performance

Julia absolute newcomer here with a question for you.

I'm teaching myself some Julia by porting some of my Mathematica and Python code (mostly scientific computations in physics etc.), and seeing what's what. Until now things have been pretty smooth. And fast. Until now.

Now, I'm simulating an elementary lock-in amplifier, which, in essence, takes a - possibly very complicated - time-dependent signal, Uin(t), and produces an output, Uout(t), phase-locked at some reference frequency fref (that is, it highlights the component of Uin(t), which has a certain phase relationship with a reference sine wave). Little does the description matter, what matters is that it basically does that by calculating the integral (I'm actually omitting the phase here for clarity):

enter image description here

So, I set out and tested this in Mathematica and Julia: I define a mockup Uin(t), pass some parameter values, and then build an array of Uout(t), at time t = 0, for a range of fref.

Julia: I used the QuadGK package for numerical integration.

T = 0.1
f = 100.
Uin(t) = sin(2pi * f * t) + sin(2pi * 2f *t)
Uout(t, fref)::Float64 = quadgk(s -> Uin(s) * sin(2pi * fref * s), t-T, t, rtol=1E-3)[1]/T
frng = 80.:1.:120.
print(@time broadcast(Uout, 0., frng))

Mathematica

T = 0.1;
f = 100.;
Uin[t_] := Sin[2 π f t] + Sin[2 π 2 f t]
Uout[t_, fref_] := NIntegrate[Sin[2 π fref s] Uin[s], {s, t - T, t}]/T
frng = Table[i, {i, 80, 120, 1}];
Timing[Table[Uout[0, fr], {fr, frng}]]

Results:

Julia timed the operation anywhere between 45 and 55 seconds, on an i7-5xxx laptop on battery power, which is a lot, while Mathematica did it in ~2 seconds. The difference is abysmal and, honestly, hard to believe. I know Mathematica has some pretty sweet and refined algorithms in its kernel, but Julia is Julia. So, question is: what gives?

P.S.: setting f and T as const does reduce Julia's time to ~8-10 seconds, but f and T cannot be consts in the actual program. Other than that, is there something obvious I'm missing?

EDIT Feb 2, 2020:

The slowing down seem to be due to the algorithm trying to hunt down precision when the value is near-zero, e.g. see below: for fref = 95 the calculation takes 1 whole second(!), while for adjacent frequency values it computes instantly (returned result is a tuple of (res, error)). Seems the quadgk function stalls at very small values):

  0.000124 seconds (320 allocations: 7.047 KiB)
fref = 94.0 (-0.08637214864144352, 9.21712218998258e-6)

  1.016830 seconds (6.67 M allocations: 139.071 MiB, 14.35% gc time)
fref = 95.0 (-6.088184966010742e-16, 1.046186419361636e-16)

  0.000124 seconds (280 allocations: 6.297 KiB)
fref = 96.0 (0.1254003757465191, 0.00010132083518769636)

Notes: this is regardless of what tolerance I ask to be produced. Also, Mathematica generally hits machine precision tolerances by default, while somewhat slowing down at near-zeros, and numpy/scipy just fly through the whole thing, but produce less precise results than Mathematica (at default settings; didn't look much into this).

Solution

Your problem is related to the choice of error tolerance. Relative error of 1e-3 doesn't sound so bad, but it actually is when the integral is close to zero. In particular, this happens when fref = 80.0 (and 85, 90, 95, not 100, 105, etc.):

julia> Uout(0.0, 80.0, f, T)
1.2104987553880609e-16

To quote from the docstring of quadgk:

(Note that it is useful to specify a positive atol in cases where norm(I) may be zero.)

Let's try to set an absolute tolerance, for example 1e-6, and compare. First the code (using the code from @ARamirez):

Uin(t, f) = sin(2π * f * t) + sin(4π * f * t)

function Uout(t, fref, f , T)
    quadgk(s -> Uin(s, f) * sin(2π * fref * s), t-T, t, rtol=1e-3)[1]/T
end
function Uout_new(t, fref, f , T) # with atol
    quadgk(s -> Uin(s, f) * sin(2π * fref * s), t-T, t, rtol=1e-3, atol=1e-6)[1]/T
end

Then the benchmarking (use BenchmarkTools for this)

using BenchmarkTools
T = 0.1
f = 100.0
freqs = 80.0:1.0:120.0

@btime Uout.(0.0, $freqs, $f, $T);
6.302 s (53344283 allocations: 1.09 GiB)

@btime Uout_new.(0.0, $freqs, $f, $T);
1.270 ms (11725 allocations: 262.08 KiB)

OK, that's 5000 times faster. Is that ok?