Overhead when using vector-valued functions in Julia

I have noticed that, in Julia, the execution speed of my code slows down dramatically when using vector-valued functions. Consider the following MWE:

using BenchmarkTools

function f_scalar(x::Float64)::Float64
    return -x
end

function f_vector(x::Float64)::Array{Float64,1}
    return [-x,0.0]
end

function trajectory_scalar(x0::Float64,t::Float64,dt::Float64)
   x = x0
   nsteps = convert(Int,t/dt)
   xvec = zeros(nsteps)
   for k in 1:nsteps
      x = x + f_scalar(x)*dt
      xvec[k] = x
   end
   return xvec
end


function trajectory_vector(x0::Float64,t::Float64,dt::Float64)
   x = x0
   nsteps = convert(Int,t/dt)
   xvec = zeros(nsteps)
   for k in 1:nsteps
      x = x + f_vector(x)[1]*dt
      xvec[k] = x
   end
   return xvec
end

@btime trajectory_scalar(2.0,10.0,0.01)      #1.140 \mu s (1 allocation: 7.94 KiB)

@btime trajectory_vector(2.0,10.0,0.01)      #12.800 \mu s (1001 allocations: 86.06 KiB)

The code involving the vector-valued function is an order of magnitude slower. I guess this is due to greatly increased amount of memory allocations; it seems that a new vector is allocated every time the function is called.

If so, is there any way of avoiding this overhead and still use vector-valued functions? In the example above, the vector-valued function is obviously not needed, but, in practice, I want to perform simulations of vector-valued stochastic differential equations, so it would be very convenient to actually write the drift vector as a vector and not having to call a different function for every component.

EDIT: Using August's comment, the vectorized version indeed becomes as fast as the non-vectorized one when using StaticArrays:

function f_vector(x::Float64)::SVector{2,Float64}
    return SVector(-x,0.0)
end

@btime trajectory_vector(2.0,10.0,0.01)      # 1.200 \mu s (1 allocation: 7.94 KiB)

Solution

Use a StaticArray (from the StaticArrays.jl) for small fixed-size vectors like this (i.e. vectors whose size is known at compile-time).

The fact that the size is known at compile time allows the compile to put StaticArrays on the stack or in registers rather than doing a heap allocation (required for something like Vector with a runtime size).

Note that your code is not runnable because d is not defined. I hope it's not a global! Untyped/nonconstant global variables are slow and should be avoided in performance-critical code. If I use const d = 100 then it runs, and if I change f_vector to:

using StaticArrays
function f_vector(x::Float64)
    return SA[-x,0.0]
end

then trajectory_vector runs at the basically the same speed as trajectory_scalar.

Note that all of your argument and return type declarations do nothing for performance. That's now how Julia works; it just makes your code less flexible.